utfcpp 和 Win32 广泛的 API

发布于 2024-09-11 11:26:31 字数 2018 浏览 4 评论 0原文

使用小型 utfcpp 库来转换我从宽Windows返回的所有内容是否好/安全/可能使用 utf16to8 获取有效 UTF8 表示的 API（FindFirstFileW 等）？

我想在内部使用 UTF8，但无法获得正确的输出（通过另一次转换后的 wcout 或普通 cout）。正常的 ASCII 字符当然可以工作，但是 ñä 会变得混乱。

或者有更简单的选择吗？

谢谢！

更新：感谢 Hans（如下），我现在可以通过 Windows API 轻松进行 UTF8<->UTF16 转换。两种方式转换是有效的，但是 UTF16 字符串中的 UTF8 有一些额外的字符，可能会在以后给我带来一些麻烦......）。出于纯粹的友好，我将在这里分享它：））：

// UTF16 -> UTF8 conversion
std::string toUTF8( const std::wstring &input )
{
    // get length
    int length = WideCharToMultiByte( CP_UTF8, NULL,
                                      input.c_str(), input.size(),
                                      NULL, 0,
                                      NULL, NULL );
    if( !(length > 0) )
        return std::string();
    else
    {
        std::string result;
        result.resize( length );

        if( WideCharToMultiByte( CP_UTF8, NULL,
                                 input.c_str(), input.size(),
                                 &result[0], result.size(),
                                 NULL, NULL ) > 0 )
            return result;
        else
            throw std::runtime_error( "Failure to execute toUTF8: conversion failed." );
    }
}
// UTF8 -> UTF16 conversion
std::wstring toUTF16( const std::string &input )
{
    // get length
    int length = MultiByteToWideChar( CP_UTF8, NULL,
                                      input.c_str(), input.size(),
                                      NULL, 0 );
    if( !(length > 0) )
        return std::wstring();
    else
    {
        std::wstring result;
        result.resize( length );

        if( MultiByteToWideChar(CP_UTF8, NULL,
                                input.c_str(), input.size(),
                                &result[0], result.size()) > 0 )
            return result;
        else
            throw std::runtime_error( "Failure to execute toUTF16: conversion failed." );
    }
}

原文

Is it good/safe/possible to use the tiny utfcpp library for converting everything I get back from the wide Windows API (FindFirstFileW and such) to a valid UTF8 representation using utf16to8?

I would like to use UTF8 internally, but am having trouble getting the correct output (via wcout after another conversion or plain cout). Normal ASCII characters work of course, but ñä gets messed up.

Or is there an easier alternative?

Thanks!

UPDATE: Thanks to Hans (below), I now have an easy UTF8<->UTF16 conversion through the Windows API. Two way conversion works, but the UTF8 from UTF16 string has some extra characters that might cause me some trouble later on...). I'll share it here out of pure friendliness :) ):

// UTF16 -> UTF8 conversion
std::string toUTF8( const std::wstring &input )
{
    // get length
    int length = WideCharToMultiByte( CP_UTF8, NULL,
                                      input.c_str(), input.size(),
                                      NULL, 0,
                                      NULL, NULL );
    if( !(length > 0) )
        return std::string();
    else
    {
        std::string result;
        result.resize( length );

        if( WideCharToMultiByte( CP_UTF8, NULL,
                                 input.c_str(), input.size(),
                                 &result[0], result.size(),
                                 NULL, NULL ) > 0 )
            return result;
        else
            throw std::runtime_error( "Failure to execute toUTF8: conversion failed." );
    }
}
// UTF8 -> UTF16 conversion
std::wstring toUTF16( const std::string &input )
{
    // get length
    int length = MultiByteToWideChar( CP_UTF8, NULL,
                                      input.c_str(), input.size(),
                                      NULL, 0 );
    if( !(length > 0) )
        return std::wstring();
    else
    {
        std::wstring result;
        result.resize( length );

        if( MultiByteToWideChar(CP_UTF8, NULL,
                                input.c_str(), input.size(),
                                &result[0], result.size()) > 0 )
            return result;
        else
            throw std::runtime_error( "Failure to execute toUTF16: conversion failed." );
    }
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

轻拂→两袖风尘 2024-09-18 11:26:31

Win32 API 已经有一个函数可以执行此操作，即 CodePage = CP_UTF8 的 WideCharToMultiByte()。使您不必依赖另一个库。

通常不能将结果与 wcout 一起使用。它的输出发送到控制台，由于遗留原因，它使用 8 位 OEM 编码。您可以使用 SetConsoleCP() 更改代码页，65001 是 UTF-8 (CP_UTF8) 的代码页。

您的下一个绊脚石将是控制台使用的字体。您必须更改它，但找到一种固定间距且具有完整的字形集来覆盖 Unicode 的字体将会很困难。当输出中出现正方形时，您会发现存在字体问题。问号是编码问题。

回复收藏 0 原文

裂开嘴轻声笑有多痛 2024-09-18 11:26:31

为什么要在内部使用UTF8？您是否正在处理如此多的文本，以至于使用 UTF16 会产生不合理的内存需求？即使是这种情况，您可能最好还是使用宽字符，并以其他方式处理内存问题（使用磁盘缓存、更好的算法或数据结构）。

使用 Win32 API 内部的本机宽字符，您的代码将更加干净且更易于处理，并且仅在读取或写入需要它的数据（例如 XML 文件或 REST API）时才进行 UTF8 转换。

您的问题也可能发生在将输出打印到控制台时，请参阅：在 Windows 控制台应用程序中输出 unicode 字符串

最后，我没有使用 utfcpp 库，但使用 Win32 的 WideCharToMultiByte 和 MultiByteToWideChar 执行 UTF8 转换相当简单CP_UTF8 作为代码页。就我个人而言，我会进行一次性转换并使用 UTF16 格式的文本，直到需要时以 UTF8 格式输出或传输文本。