日语系统上的字符转换不正确
我有一个使用多字节字符集编译的项目。当 msg1 包含日语字符时,以下转换失败。
bool MyClass::UnfoldEnvelope(BSTR msg1)
{
CW2A msg(msg1);
LPCTSTR p0 = msg;
....
}
在输入时,msg1 是一个包含 unicode 字符并具有日语路径名的 BSTR。转换 CW2A 似乎有效,因为在调用后,msg 包含可识别的日语字符串。但是,LPCTSTR 分配失败。该行之后,p0 包含垃圾。字符串 p0 随后在我不愿意触及的旧代码中使用。
在这种情况下,获取指向字符串“msg”的指针的正确方法是什么?
在英语中一切正常。
I have a project that is compiled with the multibyte character set. The conversion below fails when msg1 contains Japanese characters.
bool MyClass::UnfoldEnvelope(BSTR msg1)
{
CW2A msg(msg1);
LPCTSTR p0 = msg;
....
}
On entry, msg1 is a BSTR that contains unicode characters and has a path name in Japanese. The conversion CW2A appears to work in that after the call, msg contains the string recognizably in Japanese. However, the LPCTSTR assignment fails. After the line, p0 contains garbage. The string p0 is used subsequently in old code I am reluctant to touch.
What is the correct way to get a pointer to the string "msg" in this case?
In English all works fine.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
尝试使用
WideCharToMultiByte
! CP_ACP 是将宽字符串转换为当前 Windows 语言单字节字符串(在日语 Windows 上可能是日语,CW2A 执行相同操作)。如果您的 Windows 不是日语,但有日语字符,则应使用 CP_UTF8 (UTF-8),并在使用(显示、打印或用作文件名)时将文本传输回 UTF-16 (wchar_t)。要转换回来,您应该使用MultiByteToWideChar
函数。需要澄清的是:ANSI 多字节代码只是整个 Unicode 的子集。 Windows 使用与您的 Windows 位置相同的子集(您可以在 控制面板)。如果您有真正的 Unicode 字符串或不是基于区域设置的字符串,则应保留 Unicode 中的所有字符。如果您想使用单字节字符串和 Unicode,则应将
wchar_t
字符串(所有 Windows 宽字符均为 UTF-16)转换为 UTF-8 Unicode 字符串。检查这个来源:
Try to use
WideCharToMultiByte
! CP_ACP is transfer the wide character string to the current Windows language single byte string (it could be Japanese on Japanese Windows, CW2A do the same). If your Windows is not Japanese, but you have Japanese characters, you should use CP_UTF8 (UTF-8) and transfer the text back to UTF-16 (wchar_t) when it is used (displayed, printed or used as a file name). To transform back, you should useMultiByteToWideChar
function.Just to make clear: ANSI multibyte code is just a subset of the whole Unicode. Windows use the same subset as your Windows location (you could config it in Control Panel). If you have a real Unicode string or not your locale based string, you should keep all of the characters in Unicode. If you want to work with single byte string and Unicode, you should transfer your
wchar_t
string (all Windows wide char is UTF-16) to UTF-8 Unicode string.Check this source: