日语系统上的字符转换不正确

发布于 2025-01-04 19:31:33 字数 365 浏览 1 评论 0原文

我有一个使用多字节字符集编译的项目。当 msg1 包含日语字符时，以下转换失败。

bool MyClass::UnfoldEnvelope(BSTR msg1)
{
    CW2A msg(msg1);
    LPCTSTR p0 = msg;
    ....
}

在输入时，msg1 是一个包含 unicode 字符并具有日语路径名的 BSTR。转换 CW2A 似乎有效，因为在调用后，msg 包含可识别的日语字符串。但是，LPCTSTR 分配失败。该行之后，p0 包含垃圾。字符串 p0 随后在我不愿意触及的旧代码中使用。

在这种情况下，获取指向字符串“msg”的指针的正确方法是什么？

在英语中一切正常。

原文

I have a project that is compiled with the multibyte character set. The conversion below fails when msg1 contains Japanese characters.

bool MyClass::UnfoldEnvelope(BSTR msg1)
{
    CW2A msg(msg1);
    LPCTSTR p0 = msg;
    ....
}

On entry, msg1 is a BSTR that contains unicode characters and has a path name in Japanese. The conversion CW2A appears to work in that after the call, msg contains the string recognizably in Japanese. However, the LPCTSTR assignment fails. After the line, p0 contains garbage. The string p0 is used subsequently in old code I am reluctant to touch.

What is the correct way to get a pointer to the string "msg" in this case?

In English all works fine.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

悲念泪 2025-01-11 19:31:33

尝试使用 WideCharToMultiByte！ CP_ACP 是将宽字符串转换为当前 Windows 语言单字节字符串（在日语 Windows 上可能是日语，CW2A 执行相同操作）。如果您的 Windows 不是日语，但有日语字符，则应使用 CP_UTF8 (UTF-8)，并在使用（显示、打印或用作文件名）时将文本传输回 UTF-16 (wchar_t)。要转换回来，您应该使用 MultiByteToWideChar 函数。

需要澄清的是：ANSI 多字节代码只是整个 Unicode 的子集。 Windows 使用与您的 Windows 位置相同的子集（您可以在控制面板）。如果您有真正的 Unicode 字符串或不是基于区域设置的字符串，则应保留 Unicode 中的所有字符。如果您想使用单字节字符串和 Unicode，则应将 wchar_t 字符串（所有 Windows 宽字符均为 UTF-16）转换为 UTF-8 Unicode 字符串。

检查这个来源：

bool MyClass::UnfoldEnvelope(BSTR msg1) 
{
    // Get the necessary space for single byte string 
    int new_size = WideCharToMultiByte( CP_UTF8, 0, msg1, -1, NULL, NULL, NULL, NULL );
    if ( new_size <= 0 )
      return false;
    // Use vector to C functions
    vector<char> p0(new_size);
    // Convert the string
    if ( WideCharToMultiByte( CP_UTF8, 0, msg1, -1, &p0[0], new_size, NULL, NULL ) <= 0 )
    {
      return false;
    }
    // use string as a usual single byte string (save, load etc.)
    .... 
    // get the string size in UTF-16
    new_size = MultiByteToWideChar( CP_UTF8, 0, &p0[0], -1, NULL, NULL );
    if ( new_size <= 0 )
      return false;
    // Use vector to C functions
    vector<wchar_t> p1w(new_size);
    // convert back to UTF-16
    if ( MultiByteToWideChar( CP_UTF8, 0, &p0[0], -1, &p1w[0], new_size ) <= 0 )
      return false;
    ...
    // use your Unicode string as a file name
    return ( CopyFileW( L"old_file", &p1w[0], TRUE ) != FALSE );
}

Try to use WideCharToMultiByte! CP_ACP is transfer the wide character string to the current Windows language single byte string (it could be Japanese on Japanese Windows, CW2A do the same). If your Windows is not Japanese, but you have Japanese characters, you should use CP_UTF8 (UTF-8) and transfer the text back to UTF-16 (wchar_t) when it is used (displayed, printed or used as a file name). To transform back, you should use MultiByteToWideChar function.

Just to make clear: ANSI multibyte code is just a subset of the whole Unicode. Windows use the same subset as your Windows location (you could config it in Control Panel). If you have a real Unicode string or not your locale based string, you should keep all of the characters in Unicode. If you want to work with single byte string and Unicode, you should transfer your wchar_t string (all Windows wide char is UTF-16) to UTF-8 Unicode string.

Check this source:

bool MyClass::UnfoldEnvelope(BSTR msg1) 
{
    // Get the necessary space for single byte string 
    int new_size = WideCharToMultiByte( CP_UTF8, 0, msg1, -1, NULL, NULL, NULL, NULL );
    if ( new_size <= 0 )
      return false;
    // Use vector to C functions
    vector<char> p0(new_size);
    // Convert the string
    if ( WideCharToMultiByte( CP_UTF8, 0, msg1, -1, &p0[0], new_size, NULL, NULL ) <= 0 )
    {
      return false;
    }
    // use string as a usual single byte string (save, load etc.)
    .... 
    // get the string size in UTF-16
    new_size = MultiByteToWideChar( CP_UTF8, 0, &p0[0], -1, NULL, NULL );
    if ( new_size <= 0 )
      return false;
    // Use vector to C functions
    vector<wchar_t> p1w(new_size);
    // convert back to UTF-16
    if ( MultiByteToWideChar( CP_UTF8, 0, &p0[0], -1, &p1w[0], new_size ) <= 0 )
      return false;
    ...
    // use your Unicode string as a file name
    return ( CopyFileW( L"old_file", &p1w[0], TRUE ) != FALSE );
}

回复收藏 0 原文

~没有更多了~