utfcpp 和 Win32 广泛的 API
使用小型 utfcpp 库来转换我从宽Windows返回的所有内容是否好/安全/可能使用 utf16to8 获取有效 UTF8 表示的 API(FindFirstFileW 等)?
我想在内部使用 UTF8,但无法获得正确的输出(通过另一次转换后的 wcout 或普通 cout)。正常的 ASCII 字符当然可以工作,但是 ñä 会变得混乱。
或者有更简单的选择吗?
谢谢!
更新:感谢 Hans(如下),我现在可以通过 Windows API 轻松进行 UTF8<->UTF16 转换。两种方式转换是有效的,但是 UTF16 字符串中的 UTF8 有一些额外的字符,可能会在以后给我带来一些麻烦......)。出于纯粹的友好,我将在这里分享它:)):
// UTF16 -> UTF8 conversion
std::string toUTF8( const std::wstring &input )
{
// get length
int length = WideCharToMultiByte( CP_UTF8, NULL,
input.c_str(), input.size(),
NULL, 0,
NULL, NULL );
if( !(length > 0) )
return std::string();
else
{
std::string result;
result.resize( length );
if( WideCharToMultiByte( CP_UTF8, NULL,
input.c_str(), input.size(),
&result[0], result.size(),
NULL, NULL ) > 0 )
return result;
else
throw std::runtime_error( "Failure to execute toUTF8: conversion failed." );
}
}
// UTF8 -> UTF16 conversion
std::wstring toUTF16( const std::string &input )
{
// get length
int length = MultiByteToWideChar( CP_UTF8, NULL,
input.c_str(), input.size(),
NULL, 0 );
if( !(length > 0) )
return std::wstring();
else
{
std::wstring result;
result.resize( length );
if( MultiByteToWideChar(CP_UTF8, NULL,
input.c_str(), input.size(),
&result[0], result.size()) > 0 )
return result;
else
throw std::runtime_error( "Failure to execute toUTF16: conversion failed." );
}
}
Is it good/safe/possible to use the tiny utfcpp library for converting everything I get back from the wide Windows API (FindFirstFileW and such) to a valid UTF8 representation using utf16to8?
I would like to use UTF8 internally, but am having trouble getting the correct output (via wcout after another conversion or plain cout). Normal ASCII characters work of course, but ñä gets messed up.
Or is there an easier alternative?
Thanks!
UPDATE: Thanks to Hans (below), I now have an easy UTF8<->UTF16 conversion through the Windows API. Two way conversion works, but the UTF8 from UTF16 string has some extra characters that might cause me some trouble later on...). I'll share it here out of pure friendliness :) ):
// UTF16 -> UTF8 conversion
std::string toUTF8( const std::wstring &input )
{
// get length
int length = WideCharToMultiByte( CP_UTF8, NULL,
input.c_str(), input.size(),
NULL, 0,
NULL, NULL );
if( !(length > 0) )
return std::string();
else
{
std::string result;
result.resize( length );
if( WideCharToMultiByte( CP_UTF8, NULL,
input.c_str(), input.size(),
&result[0], result.size(),
NULL, NULL ) > 0 )
return result;
else
throw std::runtime_error( "Failure to execute toUTF8: conversion failed." );
}
}
// UTF8 -> UTF16 conversion
std::wstring toUTF16( const std::string &input )
{
// get length
int length = MultiByteToWideChar( CP_UTF8, NULL,
input.c_str(), input.size(),
NULL, 0 );
if( !(length > 0) )
return std::wstring();
else
{
std::wstring result;
result.resize( length );
if( MultiByteToWideChar(CP_UTF8, NULL,
input.c_str(), input.size(),
&result[0], result.size()) > 0 )
return result;
else
throw std::runtime_error( "Failure to execute toUTF16: conversion failed." );
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
Win32 API 已经有一个函数可以执行此操作,即 CodePage = CP_UTF8 的 WideCharToMultiByte()。使您不必依赖另一个库。
通常不能将结果与 wcout 一起使用。它的输出发送到控制台,由于遗留原因,它使用 8 位 OEM 编码。您可以使用 SetConsoleCP() 更改代码页,65001 是 UTF-8 (CP_UTF8) 的代码页。
您的下一个绊脚石将是控制台使用的字体。您必须更改它,但找到一种固定间距且具有完整的字形集来覆盖 Unicode 的字体将会很困难。当输出中出现正方形时,您会发现存在字体问题。问号是编码问题。
The Win32 API already has a function to do this, WideCharToMultiByte() with CodePage = CP_UTF8. Saves you from having to rely on another library.
You cannot normally use the result with wcout. Its output goes to the console, it uses an 8-bit OEM encoding for legacy reasons. You can change the code page with SetConsoleCP(), 65001 is the code page for UTF-8 (CP_UTF8).
Your next stumbling block would be the font that's used for the console. You'll have to change it but finding a font that's fixed-pitch and has a full set of glyphs to cover Unicode is going to be difficult. You'll see you have a font problem when you get square rectangles in the output. Question marks are encoding problems.
为什么要在内部使用UTF8?您是否正在处理如此多的文本,以至于使用 UTF16 会产生不合理的内存需求?即使是这种情况,您可能最好还是使用宽字符,并以其他方式处理内存问题(使用磁盘缓存、更好的算法或数据结构)。
使用 Win32 API 内部的本机宽字符,您的代码将更加干净且更易于处理,并且仅在读取或写入需要它的数据(例如 XML 文件或 REST API)时才进行 UTF8 转换。
您的问题也可能发生在将输出打印到控制台时,请参阅: 在 Windows 控制台应用程序中输出 unicode 字符串
最后,我没有使用 utfcpp 库,但使用 Win32 的
WideCharToMultiByte
和MultiByteToWideChar
执行 UTF8 转换相当简单CP_UTF8
作为代码页。就我个人而言,我会进行一次性转换并使用 UTF16 格式的文本,直到需要时以 UTF8 格式输出或传输文本。Why do you want to use UTF8 internally? Are you working with so much text that using UTF16 would create unreasonable memory demands? Even if that was the case, you're probably better off using wide chars anyway, and dealing with memory issues in some other way (using a disk cache, better algorithms or data structures).
Your code will be much cleaner and easier to deal with using wide chars native to the Win32 API internally, and only doing UTF8 conversions when reading or writing out data that requires it (eg. XML files or REST APIs).
Your problem may also occur at the point where you print your output to the console, see: Output unicode strings in Windows console app
Finally I haven't used the utfcpp library, but UTF8 conversions are fairly trivial to perform using Win32's
WideCharToMultiByte
andMultiByteToWideChar
withCP_UTF8
as the code page. Personally I would do a one time conversion and work with the text in UTF16 until it was time to output or transfer it in UTF8 if needed.