关于 UTF16 和 WCS 之间转换的 Unicode ICU4C 问题(仅限 OS-X)
我在 Windows、Linux 和 Mac-OSX 下构建的 C++ 软件中使用 ICU4C。 我仅在 Mac-OSX 下遇到问题,并且仅与 UTF16 和 WCS 之间的转换(调用 u_strToWCS )有关。 只需将 unicode 字符替换为固定字符即可。
ICU4C 版本并不重要:我昨天尝试了最新的版本。
我的 Mac OS-X 是 10.6.6 (Snow Leopard),GCC:i686-apple-darwin10-gcc-4.2.1。
我还尝试在共享库和静态库之间切换,而不进行任何更改。
我用下面的代码重现了该问题。查看变量“c1”、“c2”和“c3”:Windows 和 Linux 给出相同的结果,Mac OS-X 则不然(我的问题)。
我不明白这是一个编译问题,还是 icu bug,还是其他什么。
我希望有人能给我建议一个方向,或者至少确认我的测试结果。
谢谢。
// Manually construct UTF16 buffer of this string: http://pastebin.com/HW06TaA9
unsigned char* pSource = new unsigned char[28];
pSource[0] = 84;
pSource[1] = 0;
pSource[2] = 101;
pSource[3] = 0;
pSource[4] = 115;
pSource[5] = 0;
pSource[6] = 116;
pSource[7] = 0;
pSource[8] = 32;
pSource[9] = 0;
pSource[10] = 179;
pSource[11] = 111;
pSource[12] = 128;
pSource[13] = 149;
pSource[14] = 121;
pSource[15] = 114;
pSource[16] = 43;
pSource[17] = 82;
pSource[18] = 76;
pSource[19] = 136;
pSource[20] = 63;
pSource[21] = 101;
pSource[22] = 64;
pSource[23] = 83;
pSource[24] = 125;
pSource[25] = 0;
pSource[26] = 0;
pSource[27] = 0;
int32_t nChars = 100;
wchar_t* pDest = new wchar_t[nChars];
memset(pDest, 0, nChars * sizeof(wchar_t));
UErrorCode status = U_ZERO_ERROR;
u_strToWCS(pDest, nChars, &nChars, (const UChar*) pSource, -1, &status);
if(U_SUCCESS(status))
{
wchar_t c1 = pDest[2]; // Ascii char. Win: 115, Linux: 115, OS-X: 115
wchar_t c2 = pDest[5]; // Japan char. Win: 28595, Linux: 28595, OS-X: 26
wchar_t c3 = pDest[6]; // Japan char. Win: 38272, Linux: 38272, OS-X: 26
}
I use ICU4C in a C++ software builded under Windows, Linux and Mac-OSX.
I have an issue ONLY under Mac-OSX, and only related to conversion between UTF16 and WCS (calling u_strToWCS ).
Simply the unicoded chars are replaced with a fixed char.
The ICU4C version doesn't matter: i try the lastest yesterday.
My Mac OS-X is 10.6.6 (Snow Leopard), GCC: i686-apple-darwin10-gcc-4.2.1.
I also try to switch between shared libraries and static libraries, without any changes.
I reproduce the issue with the code below. Look the variables "c1", "c2" and "c3": Windows and Linux give the same result, Mac OS-X not (my issue).
I don't understand if is a compilation problem, or icu bug, or whatelse.
I hope anyone can suggest to me a direction, or at least confirm my test results.
Thanks.
// Manually construct UTF16 buffer of this string: http://pastebin.com/HW06TaA9
unsigned char* pSource = new unsigned char[28];
pSource[0] = 84;
pSource[1] = 0;
pSource[2] = 101;
pSource[3] = 0;
pSource[4] = 115;
pSource[5] = 0;
pSource[6] = 116;
pSource[7] = 0;
pSource[8] = 32;
pSource[9] = 0;
pSource[10] = 179;
pSource[11] = 111;
pSource[12] = 128;
pSource[13] = 149;
pSource[14] = 121;
pSource[15] = 114;
pSource[16] = 43;
pSource[17] = 82;
pSource[18] = 76;
pSource[19] = 136;
pSource[20] = 63;
pSource[21] = 101;
pSource[22] = 64;
pSource[23] = 83;
pSource[24] = 125;
pSource[25] = 0;
pSource[26] = 0;
pSource[27] = 0;
int32_t nChars = 100;
wchar_t* pDest = new wchar_t[nChars];
memset(pDest, 0, nChars * sizeof(wchar_t));
UErrorCode status = U_ZERO_ERROR;
u_strToWCS(pDest, nChars, &nChars, (const UChar*) pSource, -1, &status);
if(U_SUCCESS(status))
{
wchar_t c1 = pDest[2]; // Ascii char. Win: 115, Linux: 115, OS-X: 115
wchar_t c2 = pDest[5]; // Japan char. Win: 28595, Linux: 28595, OS-X: 26
wchar_t c3 = pDest[6]; // Japan char. Win: 38272, Linux: 38272, OS-X: 26
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
看到这个错误。这个问题已经在最新的 ICU 中得到修复,我已经提供了一个解决方法。 https://ssl.icu-project.org/trac/ticket/8894 #评论:4
See this bug. This had been already fixed in latest ICU and I've included a workaround. https://ssl.icu-project.org/trac/ticket/8894#comment:4