使用 unicode.org 代码或 C++ C++17 UTF8 std::string 到 std::wstring UTF32标准功能?
在稳定且经过测试的系统中寻找经典 UTF8 到 UTF32 的可行解决方案。
的源代码
现在我有了 Unicode.org 的C 代码 : https://android.googlesource.com/平台/external/id3lib/+/master/unicode.org/ConvertUTF.c https://android.googlesource.com/平台/external/id3lib/+/master/unicode.org/ConvertUTF.h 执照: https://android.googlesource.com/ platform/external/id3lib/+/master/unicode.org/readme.txt
使用以下 C++ 来连接上面的 C 库代码:
std::wstring Utf8_To_wstring(const std::string& utf8string)
{
if (utf8string.length()==0)
{
return std::wstring();
}
size_t widesize = utf8string.length();
if (sizeof(wchar_t) == 2)
{
std::wstring resultstring;
resultstring.resize(widesize, L'\0');
const UTF8* sourcestart = reinterpret_cast<const UTF8*>(utf8string.c_str());
const UTF8* sourceend = sourcestart + widesize;
UTF16* targetstart = reinterpret_cast<UTF16*>(&resultstring[0]);
UTF16* targetend = targetstart + widesize;
ConversionResult res = ConvertUTF8toUTF16(&sourcestart, sourceend, &targetstart, targetend, strictConversion);
if (res != conversionOK)
{
return std::wstring(utf8string.begin(), utf8string.end());
}
*targetstart = 0;
return std::wstring(resultstring.c_str());
}
else if (sizeof(wchar_t) == 4)
{
std::wstring resultstring;
resultstring.resize(widesize, L'\0');
const UTF8* sourcestart = reinterpret_cast<const UTF8*>(utf8string.c_str());
const UTF8* sourceend = sourcestart + widesize;
UTF32* targetstart = reinterpret_cast<UTF32*>(&resultstring[0]);
UTF32* targetend = targetstart + widesize;
ConversionResult res = ConvertUTF8toUTF32(&sourcestart, sourceend, &targetstart, targetend, lenientConversion);
if (res != conversionOK)
{
return std::wstring(utf8string.begin(), utf8string.end());
}
*targetstart = 0;
if(!resultstring.empty() && resultstring.size() > 0) {
std::wstring result = std::wstring(resultstring.c_str());
return result;
} else {
return std::wstring();
}
}
else
{
assert(false);
return L"";
}
return L"";
}
现在此代码最初可以工作,但由于上述的一些问题,很快就会崩溃接口代码。此接口代码改编自 GitHub 上的一个生产项目的开源代码...
但是,转换中会导致一些字符串崩溃,所以我猜这段代码中存在溢出
是否有人有一个简单的 C 语言的良好替换或示例代码++11/C++17 将 std::string 转换为 std::wstring 以获取编码的 UTF32 unicode 值的解决方案
Looking for a working solution to the classic UTF8 to UTF32 in a stable and tested system.
Now I have the source to Unicode.org's
C code:
https://android.googlesource.com/platform/external/id3lib/+/master/unicode.org/ConvertUTF.c
https://android.googlesource.com/platform/external/id3lib/+/master/unicode.org/ConvertUTF.h
License:
https://android.googlesource.com/platform/external/id3lib/+/master/unicode.org/readme.txt
Using the following C++ which interfaces the C library code from above:
std::wstring Utf8_To_wstring(const std::string& utf8string)
{
if (utf8string.length()==0)
{
return std::wstring();
}
size_t widesize = utf8string.length();
if (sizeof(wchar_t) == 2)
{
std::wstring resultstring;
resultstring.resize(widesize, L'\0');
const UTF8* sourcestart = reinterpret_cast<const UTF8*>(utf8string.c_str());
const UTF8* sourceend = sourcestart + widesize;
UTF16* targetstart = reinterpret_cast<UTF16*>(&resultstring[0]);
UTF16* targetend = targetstart + widesize;
ConversionResult res = ConvertUTF8toUTF16(&sourcestart, sourceend, &targetstart, targetend, strictConversion);
if (res != conversionOK)
{
return std::wstring(utf8string.begin(), utf8string.end());
}
*targetstart = 0;
return std::wstring(resultstring.c_str());
}
else if (sizeof(wchar_t) == 4)
{
std::wstring resultstring;
resultstring.resize(widesize, L'\0');
const UTF8* sourcestart = reinterpret_cast<const UTF8*>(utf8string.c_str());
const UTF8* sourceend = sourcestart + widesize;
UTF32* targetstart = reinterpret_cast<UTF32*>(&resultstring[0]);
UTF32* targetend = targetstart + widesize;
ConversionResult res = ConvertUTF8toUTF32(&sourcestart, sourceend, &targetstart, targetend, lenientConversion);
if (res != conversionOK)
{
return std::wstring(utf8string.begin(), utf8string.end());
}
*targetstart = 0;
if(!resultstring.empty() && resultstring.size() > 0) {
std::wstring result = std::wstring(resultstring.c_str());
return result;
} else {
return std::wstring();
}
}
else
{
assert(false);
return L"";
}
return L"";
}
Now this code initially works however crashes soon after due to some issues in the above interfacing code. This interfacing code was adapted from open source code found on GitHub from a production project...
However crashes a few strings into the conversion, so I guess there's a overflow in this code
Does anyone have a good replacement or example code for a simple C++11/C++17 solution to convert a std::string to std::wstring to get UTF32 unicode values encoded
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我有一个使用 C++17 Locale 的 UTF8 到 UTF16 的工作解决方案:
这似乎可以帮我转换到正确的 Unicode 级别,以便能够将字符代码提取为 int 以正确加载字形代码
I have a working solution for UTF8 to UTF16 using C++17 Locale:
This seems to do the job for me to convert to the correct level of Unicode to enable extraction of character codes to int to load glyph codes correctly