如何正确使用 WideCharToMultiByte

发布于 2024-07-07 16:59:50 字数 275 浏览 8 评论 0原文

我已阅读 WideCharToMultiByte 上的文档,但是我被这个参数困住了:

lpMultiByteStr
[out] Pointer to a buffer that receives the converted string.

我不太确定如何正确初始化变量并将其输入函数中

I've read the documentation on WideCharToMultiByte, but I'm stuck on this parameter:

lpMultiByteStr
[out] Pointer to a buffer that receives the converted string.

I'm not quite sure how to properly initialize the variable and feed it into the function

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

甲如呢乙后呢 2024-07-14 16:59:50

这里有几个函数(基于 Brian Bondy 的示例),它们使用 WideCharToMultiByte 和 MultiByteToWideChar 使用 utf8 在 std::wstring 和 std::string 之间进行转换,以免丢失任何数据。

// Convert a wide Unicode string to an UTF8 string
std::string utf8_encode(const std::wstring &wstr)
{
    if( wstr.empty() ) return std::string();
    int size_needed = WideCharToMultiByte(CP_UTF8, 0, &wstr[0], (int)wstr.size(), NULL, 0, NULL, NULL);
    std::string strTo( size_needed, 0 );
    WideCharToMultiByte                  (CP_UTF8, 0, &wstr[0], (int)wstr.size(), &strTo[0], size_needed, NULL, NULL);
    return strTo;
}

// Convert an UTF8 string to a wide Unicode String
std::wstring utf8_decode(const std::string &str)
{
    if( str.empty() ) return std::wstring();
    int size_needed = MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), NULL, 0);
    std::wstring wstrTo( size_needed, 0 );
    MultiByteToWideChar                  (CP_UTF8, 0, &str[0], (int)str.size(), &wstrTo[0], size_needed);
    return wstrTo;
}

Here's a couple of functions (based on Brian Bondy's example) that use WideCharToMultiByte and MultiByteToWideChar to convert between std::wstring and std::string using utf8 to not lose any data.

// Convert a wide Unicode string to an UTF8 string
std::string utf8_encode(const std::wstring &wstr)
{
    if( wstr.empty() ) return std::string();
    int size_needed = WideCharToMultiByte(CP_UTF8, 0, &wstr[0], (int)wstr.size(), NULL, 0, NULL, NULL);
    std::string strTo( size_needed, 0 );
    WideCharToMultiByte                  (CP_UTF8, 0, &wstr[0], (int)wstr.size(), &strTo[0], size_needed, NULL, NULL);
    return strTo;
}

// Convert an UTF8 string to a wide Unicode String
std::wstring utf8_decode(const std::string &str)
{
    if( str.empty() ) return std::wstring();
    int size_needed = MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), NULL, 0);
    std::wstring wstrTo( size_needed, 0 );
    MultiByteToWideChar                  (CP_UTF8, 0, &str[0], (int)str.size(), &wstrTo[0], size_needed);
    return wstrTo;
}
薆情海 2024-07-14 16:59:50

详细阐述 Brian R. Bondy 提供的答案:以下示例说明了为什么不能简单地调整输出缓冲区的大小源字符串中的宽字符数:

#include <windows.h>
#include <stdio.h>
#include <wchar.h>
#include <string.h>

/* string consisting of several Asian characters */
wchar_t wcsString[] = L"\u9580\u961c\u9640\u963f\u963b\u9644";

int main() 
{

    size_t wcsChars = wcslen( wcsString);

    size_t sizeRequired = WideCharToMultiByte( 950, 0, wcsString, -1, 
                                               NULL, 0,  NULL, NULL);

    printf( "Wide chars in wcsString: %u\n", wcsChars);
    printf( "Bytes required for CP950 encoding (excluding NUL terminator): %u\n",
             sizeRequired-1);

    sizeRequired = WideCharToMultiByte( CP_UTF8, 0, wcsString, -1,
                                        NULL, 0,  NULL, NULL);
    printf( "Bytes required for UTF8 encoding (excluding NUL terminator): %u\n",
             sizeRequired-1);
}

输出:

Wide chars in wcsString: 6
Bytes required for CP950 encoding (excluding NUL terminator): 12
Bytes required for UTF8 encoding (excluding NUL terminator): 18

Elaborating on the answer provided by Brian R. Bondy: Here's an example that shows why you can't simply size the output buffer to the number of wide characters in the source string:

#include <windows.h>
#include <stdio.h>
#include <wchar.h>
#include <string.h>

/* string consisting of several Asian characters */
wchar_t wcsString[] = L"\u9580\u961c\u9640\u963f\u963b\u9644";

int main() 
{

    size_t wcsChars = wcslen( wcsString);

    size_t sizeRequired = WideCharToMultiByte( 950, 0, wcsString, -1, 
                                               NULL, 0,  NULL, NULL);

    printf( "Wide chars in wcsString: %u\n", wcsChars);
    printf( "Bytes required for CP950 encoding (excluding NUL terminator): %u\n",
             sizeRequired-1);

    sizeRequired = WideCharToMultiByte( CP_UTF8, 0, wcsString, -1,
                                        NULL, 0,  NULL, NULL);
    printf( "Bytes required for UTF8 encoding (excluding NUL terminator): %u\n",
             sizeRequired-1);
}

And the output:

Wide chars in wcsString: 6
Bytes required for CP950 encoding (excluding NUL terminator): 12
Bytes required for UTF8 encoding (excluding NUL terminator): 18
世界如花海般美丽 2024-07-14 16:59:50

您可以通过创建新的字符数组来使用 lpMultiByteStr [out] 参数。 然后传入这个 char 数组来填充它。 只需要初始化字符串的长度+1,这样转换后就可以得到一个以null结尾的字符串。

这里有一些对您有用的辅助函数,它们显示了所有参数的用法。

#include <string>

std::string wstrtostr(const std::wstring &wstr)
{
    // Convert a Unicode string to an ASCII string
    std::string strTo;
    char *szTo = new char[wstr.length() + 1];
    szTo[wstr.size()] = '\0';
    WideCharToMultiByte(CP_ACP, 0, wstr.c_str(), -1, szTo, (int)wstr.length(), NULL, NULL);
    strTo = szTo;
    delete[] szTo;
    return strTo;
}

std::wstring strtowstr(const std::string &str)
{
    // Convert an ASCII string to a Unicode String
    std::wstring wstrTo;
    wchar_t *wszTo = new wchar_t[str.length() + 1];
    wszTo[str.size()] = L'\0';
    MultiByteToWideChar(CP_ACP, 0, str.c_str(), -1, wszTo, (int)str.length());
    wstrTo = wszTo;
    delete[] wszTo;
    return wstrTo;
}

--

任何时候在文档中,当您看到它有一个指向类型的指针的参数,并且他们告诉您这是一个输出变量时,您将需要创建该类型,然后传入指向它的指针。 该函数将使用该指针来填充您的变量。

所以你可以更好地理解这一点:

//pX is an out parameter, it fills your variable with 10.
void fillXWith10(int *pX)
{
  *pX = 10;
}

int main(int argc, char ** argv)
{
  int X;
  fillXWith10(&X);
  return 0;
}

You use the lpMultiByteStr [out] parameter by creating a new char array. You then pass this char array in to get it filled. You only need to initialize the length of the string + 1 so that you can have a null terminated string after the conversion.

Here are a couple of useful helper functions for you, they show the usage of all parameters.

#include <string>

std::string wstrtostr(const std::wstring &wstr)
{
    // Convert a Unicode string to an ASCII string
    std::string strTo;
    char *szTo = new char[wstr.length() + 1];
    szTo[wstr.size()] = '\0';
    WideCharToMultiByte(CP_ACP, 0, wstr.c_str(), -1, szTo, (int)wstr.length(), NULL, NULL);
    strTo = szTo;
    delete[] szTo;
    return strTo;
}

std::wstring strtowstr(const std::string &str)
{
    // Convert an ASCII string to a Unicode String
    std::wstring wstrTo;
    wchar_t *wszTo = new wchar_t[str.length() + 1];
    wszTo[str.size()] = L'\0';
    MultiByteToWideChar(CP_ACP, 0, str.c_str(), -1, wszTo, (int)str.length());
    wstrTo = wszTo;
    delete[] wszTo;
    return wstrTo;
}

--

Anytime in documentation when you see that it has a parameter which is a pointer to a type, and they tell you it is an out variable, you will want to create that type, and then pass in a pointer to it. The function will use that pointer to fill your variable.

So you can understand this better:

//pX is an out parameter, it fills your variable with 10.
void fillXWith10(int *pX)
{
  *pX = 10;
}

int main(int argc, char ** argv)
{
  int X;
  fillXWith10(&X);
  return 0;
}
空城缀染半城烟沙 2024-07-14 16:59:50

以下是 WideCharToMultiByteMultiByteToWideCharC 实现。
在这两种情况下,我都会确保将 null 字符附加到目标缓冲区的末尾。

如果显式指定输入字符串长度且没有终止空字符,则 MultiByteToWideChar 不会以空终止字符终止输出字符串。

如果显式指定输入字符串长度且没有终止空字符,则 WideCharToMultiByte 不会以空终止字符终止输出字符串。

即使有人指定 -1 并传入以 null 结尾的字符串,我仍然为额外的 null 字符分配足够的空间,因为对于我的用例来说,这不是问题。

wchar_t* utf8_decode( const char* str, int nbytes ) {    
    int nchars = 0;
    if ( ( nchars = MultiByteToWideChar( CP_UTF8, 
        MB_ERR_INVALID_CHARS, str, nbytes, NULL, 0 ) ) == 0 ) {
        return NULL;
    }

    wchar_t* wstr = NULL;
    if ( !( wstr = malloc( ( ( size_t )nchars + 1 ) * sizeof( wchar_t ) ) ) ) {
        return NULL;
    }

    wstr[ nchars ] = L'\0';
    if ( MultiByteToWideChar( CP_UTF8, MB_ERR_INVALID_CHARS, 
        str, nbytes, wstr, ( size_t )nchars ) == 0 ) {
        free( wstr );
        return NULL;
    }
    return wstr;
} 


char* utf8_encode( const wchar_t* wstr, int nchars ) {
    int nbytes = 0;
    if ( ( nbytes = WideCharToMultiByte( CP_UTF8, WC_ERR_INVALID_CHARS, 
        wstr, nchars, NULL, 0, NULL, NULL ) ) == 0 ) {
        return NULL;
    }

    char* str = NULL;
    if ( !( str = malloc( ( size_t )nbytes + 1 ) ) ) {
        return NULL;
    }

    str[ nbytes ] = '\0';
    if ( WideCharToMultiByte( CP_UTF8, WC_ERR_INVALID_CHARS, 
        wstr, nchars, str, nbytes, NULL, NULL ) == 0 ) {
        free( str );
        return NULL;
    }
    return str;
}

Here is a C implementation of both WideCharToMultiByte and MultiByteToWideChar.
In both cases I ensure to tack a null character to the end of the destination buffers.

MultiByteToWideChar does not null-terminate an output string if the input string length is explicitly specified without a terminating null character.

And

WideCharToMultiByte does not null-terminate an output string if the input string length is explicitly specified without a terminating null character.

Even if someone specifies -1 and passes in a null terminated string I still allocate enough space for an additional null character because for my use case this was not an issue.

wchar_t* utf8_decode( const char* str, int nbytes ) {    
    int nchars = 0;
    if ( ( nchars = MultiByteToWideChar( CP_UTF8, 
        MB_ERR_INVALID_CHARS, str, nbytes, NULL, 0 ) ) == 0 ) {
        return NULL;
    }

    wchar_t* wstr = NULL;
    if ( !( wstr = malloc( ( ( size_t )nchars + 1 ) * sizeof( wchar_t ) ) ) ) {
        return NULL;
    }

    wstr[ nchars ] = L'\0';
    if ( MultiByteToWideChar( CP_UTF8, MB_ERR_INVALID_CHARS, 
        str, nbytes, wstr, ( size_t )nchars ) == 0 ) {
        free( wstr );
        return NULL;
    }
    return wstr;
} 


char* utf8_encode( const wchar_t* wstr, int nchars ) {
    int nbytes = 0;
    if ( ( nbytes = WideCharToMultiByte( CP_UTF8, WC_ERR_INVALID_CHARS, 
        wstr, nchars, NULL, 0, NULL, NULL ) ) == 0 ) {
        return NULL;
    }

    char* str = NULL;
    if ( !( str = malloc( ( size_t )nbytes + 1 ) ) ) {
        return NULL;
    }

    str[ nbytes ] = '\0';
    if ( WideCharToMultiByte( CP_UTF8, WC_ERR_INVALID_CHARS, 
        wstr, nchars, str, nbytes, NULL, NULL ) == 0 ) {
        free( str );
        return NULL;
    }
    return str;
}
怪异←思 2024-07-14 16:59:50

我正在使用这两个辅助函数:

#pragma once
#include <Windows.h>
#undef max
#include <string>
#include <string_view>

template<bool ErrCode = false>
std::wstring multiByteToWideChar( std::string_view mbStr, UINT CodePage = CP_THREAD_ACP, DWORD dwFlags = 0 )
{
    using namespace std;
    constexpr char const *THROW_STR = "MultiByteToWideChar() conversion failed";
    auto err = []( DWORD dwErr )
    {
        if constexpr( !ErrCode )
            throw system_error( dwErr, system_category(), THROW_STR );
        else
            SetLastError( dwErr );
        return wstring();
    };
    if( mbStr.length() > (unsigned)numeric_limits<int>::max() )
        return err( ERROR_INVALID_PARAMETER );
    auto call = [&]( wchar_t *str, size_t len ) { return (unsigned)MultiByteToWideChar( CodePage, dwFlags, mbStr.data(), mbStr.length(), str, (int)len ); };
    size_t length = call( (LPWSTR)L"", 0 );
    if( DWORD dwErr; length == 0 && (dwErr = GetLastError()) != NO_ERROR )
        return err( dwErr );
    wstring wstr( length, L'\0' );
    size_t written = call( wstr.data(), length );
    if( written != length )
        return err( written == 0 ? GetLastError() : ERROR_INVALID_PARAMETER );
    if constexpr( ErrCode )
        SetLastError( ERROR_SUCCESS );
    return wstr;
}

template<bool ErrCode = false>
std::string wideCharToMultiByte( std::wstring_view wStr, UINT CodePage = CP_THREAD_ACP, DWORD dwFlags =  0, LPCCH lpDefaultChar = nullptr, LPBOOL lpUsedDefaultChar = nullptr )
{
    using namespace std;
    constexpr char const *THROW_STR = "WideCharToMultiByte() conversion failed";
    auto err = []( DWORD dwErr )
    {
        if constexpr( !ErrCode )
            throw system_error( dwErr, system_category(), THROW_STR );
        else
            SetLastError( dwErr );
        return string();
    };
    if( wStr.length() > (unsigned)numeric_limits<int>::max() )
        return err( ERROR_INVALID_PARAMETER );
    auto call = [&]( char *str, size_t length ) { return (unsigned)WideCharToMultiByte( CodePage, dwFlags, wStr.data(), (int)wStr.length(), str, (int)length, lpDefaultChar, lpUsedDefaultChar ); };
    size_t length = call( (LPSTR)"", 0 );
    if( DWORD dwErr; length == 0 && (dwErr = GetLastError()) != NO_ERROR )
        return err( dwErr );
    string str( length, '\0' );
    size_t written = call( str.data(), length );
    if( written != length )
        return err( written == 0 ? GetLastError() : ERROR_INVALID_PARAMETER );
    if constexpr( ErrCode )
        SetLastError( ERROR_SUCCESS );
    return str;
}

我认为这是封装这两个函数的最通用的方法。 使用这些API变得更加方便。

I'm using this two helper-functions:

#pragma once
#include <Windows.h>
#undef max
#include <string>
#include <string_view>

template<bool ErrCode = false>
std::wstring multiByteToWideChar( std::string_view mbStr, UINT CodePage = CP_THREAD_ACP, DWORD dwFlags = 0 )
{
    using namespace std;
    constexpr char const *THROW_STR = "MultiByteToWideChar() conversion failed";
    auto err = []( DWORD dwErr )
    {
        if constexpr( !ErrCode )
            throw system_error( dwErr, system_category(), THROW_STR );
        else
            SetLastError( dwErr );
        return wstring();
    };
    if( mbStr.length() > (unsigned)numeric_limits<int>::max() )
        return err( ERROR_INVALID_PARAMETER );
    auto call = [&]( wchar_t *str, size_t len ) { return (unsigned)MultiByteToWideChar( CodePage, dwFlags, mbStr.data(), mbStr.length(), str, (int)len ); };
    size_t length = call( (LPWSTR)L"", 0 );
    if( DWORD dwErr; length == 0 && (dwErr = GetLastError()) != NO_ERROR )
        return err( dwErr );
    wstring wstr( length, L'\0' );
    size_t written = call( wstr.data(), length );
    if( written != length )
        return err( written == 0 ? GetLastError() : ERROR_INVALID_PARAMETER );
    if constexpr( ErrCode )
        SetLastError( ERROR_SUCCESS );
    return wstr;
}

template<bool ErrCode = false>
std::string wideCharToMultiByte( std::wstring_view wStr, UINT CodePage = CP_THREAD_ACP, DWORD dwFlags =  0, LPCCH lpDefaultChar = nullptr, LPBOOL lpUsedDefaultChar = nullptr )
{
    using namespace std;
    constexpr char const *THROW_STR = "WideCharToMultiByte() conversion failed";
    auto err = []( DWORD dwErr )
    {
        if constexpr( !ErrCode )
            throw system_error( dwErr, system_category(), THROW_STR );
        else
            SetLastError( dwErr );
        return string();
    };
    if( wStr.length() > (unsigned)numeric_limits<int>::max() )
        return err( ERROR_INVALID_PARAMETER );
    auto call = [&]( char *str, size_t length ) { return (unsigned)WideCharToMultiByte( CodePage, dwFlags, wStr.data(), (int)wStr.length(), str, (int)length, lpDefaultChar, lpUsedDefaultChar ); };
    size_t length = call( (LPSTR)"", 0 );
    if( DWORD dwErr; length == 0 && (dwErr = GetLastError()) != NO_ERROR )
        return err( dwErr );
    string str( length, '\0' );
    size_t written = call( str.data(), length );
    if( written != length )
        return err( written == 0 ? GetLastError() : ERROR_INVALID_PARAMETER );
    if constexpr( ErrCode )
        SetLastError( ERROR_SUCCESS );
    return str;
}

I think this is the most versatile way to encapsulate both functions. Using these APIs becomes more convenient.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文