中国conversion依Multibytetowidechar

发布于 2025-02-10 02:58:20 字数 273 浏览 1 评论 0原文

我正在尝试在MessageBoxw中显示中文文本。但是我无法正确将其从UTF-8转换为WCHAR_T。同时,正确显示原始的WCHAR_T中文。 我玩过不同的多teToWideChar标志,但结果相同。错误转换的原因是什么?

I'm trying to display a Chinese text in the MessageBoxW. But I can't correctly convert it from UTF-8 to wchar_t. At the same time, the original wchar_t Chinese is displayed correctly.
I played with different MultiByteToWideChar flags but with the same result. What the reason of the incorrect conversion?
enter image description here

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

安静被遗忘 2025-02-17 02:58:20

char text [] =“文本”仅在UTF-8中编码源文件时才是UTF-8。由于您的标题字符串正确显示您的编码是Windows上的默认中文旧版编码,而text字符串字符串包含该编码中的字节,而不是UTF-8,因此MultibyTetoWideChar失败。您可以看到该函数如果设置标志以检查无效字符,则该函数将返回零,如果不是真正的UTF-8:

int ret = MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS, text, -1, wtext, 1000);

Microsoft编译器具有指定源和执行字符集的选项,以及/ a / UTF-8选项(建议):

/source-charset:<iana-name>|.nnnn      set source character set  
/execution-charset:<iana-name>|.nnnn   set execution character set  
/utf-8                                 set source and execution character set to UTF-8

要修复的多个选项。 #2和#3假定Microsoft编译器。其他编译器可能会有所不同。

  1. 使用char text [] = u8“文本”;由于您现有的默认编码支持中文。源字符将在该编码中解释,然后用该符号重新编码UTF-8。如果将源发送给具有不同OS默认编码的人,则该源对他们不起作用。
  2. 将源以UTF-8(w/ bom)重新释放。 MS编译器将检测BOM(用作UTF-8签名的字节订单标记),并处理源,就像指定了/utf-8一样。 文本将包含UTF-8字节。标题将正确显示。
  3. 重新保存为UTF-8(无BOM),并使用/utf-8开关进行编译,以告知编译器将源解码为UTF-8而不是默认编码。
  4. 使用仅ASCII源和逃生代码明确指定中文字符。

#4的示例将正确编译,无论OS默认编码如何:

#include <windows.h>

int main() {
    char text[] = "\xe6\x96\x87\xe6\x9c\xac";
    wchar_t wtext[1000];
    MultiByteToWideChar(CP_UTF8, 0, text, -1, wtext, 1000);
    MessageBoxW(NULL, wtext, L"\u6a19\u984c", MB_OK);
    return 0;
}

char text[] = "文本" is only UTF-8 if the source file is encoded in UTF-8. Since your title string displays correctly your encoding is the default Chinese legacy encoding on Windows, and the text string contains bytes in that encoding, and not UTF-8, so MultiByteToWideChar fails. You can see that the function returns zero if you set the flag to check for invalid characters, which happens if it isn't really UTF-8:

int ret = MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS, text, -1, wtext, 1000);

The Microsoft compiler has options to specify source and execution character set, and a /utf-8 option (recommended):

/source-charset:<iana-name>|.nnnn      set source character set  
/execution-charset:<iana-name>|.nnnn   set execution character set  
/utf-8                                 set source and execution character set to UTF-8

Multiple options to fix. #2 and #3 assume the Microsoft compiler. Other compilers may vary.

  1. Use char text[] = u8"文本"; since your existing default encoding supports Chinese. The source characters will be interpreted in that encoding and then re-encoded in UTF-8 with this notation. If the source is sent to someone with different OS default encoding, it will not work for them.
  2. Re-save the source as UTF-8 w/ BOM. The MS compiler will detect the BOM (byte order mark used as a UTF-8 signature) and process the source as if /utf-8 was specified. text will contain UTF-8 bytes. Title will display correctly.
  3. Re-save as UTF-8 (no BOM) and compile with the /utf-8 switch to inform the compiler to decode the source as UTF-8 instead of the default encoding.
  4. Use ASCII-only source and escape codes to specify the Chinese character explicitly.

Example of #4 that will compile correctly no matter the OS default encoding:

#include <windows.h>

int main() {
    char text[] = "\xe6\x96\x87\xe6\x9c\xac";
    wchar_t wtext[1000];
    MultiByteToWideChar(CP_UTF8, 0, text, -1, wtext, 1000);
    MessageBoxW(NULL, wtext, L"\u6a19\u984c", MB_OK);
    return 0;
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文