c++ 期间的 UTF8 加扰文件加载
我知道加载 unicode 是一个有点费力的点,但我不知道如何将向其他人提供的解决方案应用于我的特定问题。
我有一个 Win7/C++/DirectX9 GUI 库,可以将文本渲染到屏幕上。我以前从未遇到过问题,因为它只用于西欧语言。现在我必须将它与匈牙利语一起使用,这让我很头疼!我的特殊问题是加载该语言中找到的特殊字符。
举个例子,FELNŐTTEKNEK,意思是成人。
如果我将此字符串硬编码到我的应用程序中,它会正确渲染:
guiTitle->SetText( L"FELNŐTTEKNEK" );
这会将字符串存储为 std::wstring,并使用 ID3DXFont::DrawTextW() 渲染它。这也证明了我选择的字体 Futura CE 能够渲染特殊字符(CE = 中欧)。
到目前为止,一切都很好。接下来我只是希望能够从文本文件加载文本。没什么大不了的。然而结果却很糟糕!特殊的 Ő 被另一个字符替换,主要是 Å 或什至像 Å 这样的两个字符(第二个通常无法打印)
我已确保输入文本文件被编码为 UTF-8 并且我天真地尝试加载它:
wifstream f("data/language.ini");
wstring w;
getline( f, w );
guiTitle->SetText( w );
不知怎的,我仍然在加扰它。我加载为 UTF-8 吗?有没有办法保证这一点?我只需要确保我有一个宽字符串,其中包含文本编辑器中显示的文本。
非常感谢收到的任何帮助。
硅
I know loading unicode is a somewhat laboured point, but I can't see how to apply the solutions presented to others to my particular problem.
I have a Win7/C++/DirectX9 GUI library which can render text to the screen. I've never had a problem before since it has only be used with Western European language. Now I have to use it with Hungarian, and it is giving me a headache! My particular problem is with loading the special characters found in that language.
Take this example, FELNŐTTEKNEK, meaning ADULT.
If I hard code this string into my app, it renders correctly:
guiTitle->SetText( L"FELNŐTTEKNEK" );
This stores the string as a std::wstring, rendering it with ID3DXFont::DrawTextW(). It also proves my chosen font, Futura CE, is able to render the special characters (CE = Central European).
So far so good. Next I simply want to be able to load the text from a text file. No big deal. However the results are bad! The special Ő is replaced by another character, mainly Å or even two characters like Å (2nd one usually unprintable)
I have ensured by input text file is encoded as UTF-8 and am naively trying to load it thus:
wifstream f("data/language.ini");
wstring w;
getline( f, w );
guiTitle->SetText( w );
Somehow I am still scrambling it. Am I loading as UTF-8? Is there a way to ensure this? I just need to ensure I have a wide string with the text as show in text editor.
Any assistance most gratefully received.
Si
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
忘记
wifstream
吧,让它工作太难了。做法:并使用
MultiByteToWideChar
实现utf8_to_utf16
。另请参阅https://stackoverflow.com/questions/1049947/should-utf-16-被认为是有害的。
Forget about
wifstream
, it's just too hard to make it work. Do:And use
MultiByteToWideChar
to implementutf8_to_utf16
.See also https://stackoverflow.com/questions/1049947/should-utf-16-be-considered-harmful.
DrawTextW 需要 UTF-16。
您正在做的是将每个 UTF-8 代码单元(字节)通过零填充转换为 16 位值 - 仅当您的 UTF-8 专门包含来自 ascii 子集的字符时,这才能正确地将 UTF-8 转换为 UTF-16统一码。
您需要做的就是正确从 UTF-8 转换为 UTF-16。将字符串加载到 std::string (不是 std::wstring)中,然后将该 UTF-8 字符串转换为 UTF-16 字符串,并将其传递给需要 UTF-16 字符串的 API。
DrawTextW is expecting UTF-16.
What you're doing is converting each UTF-8 code unit (byte) into a 16 bit value by zero padding it - this correctly converts UTF-8 to UTF-16 only if your UTF-8 exclusively contains characters from the ascii subset of unicode.
What you need to do is to correctly convert from UTF-8 to UTF-16. Load the string into a std::string (not a std::wstring) then convert that UTF-8 string into a UTF-16 string and pass it to the API expecting a UTF-16 string.
从来没有理解那里声明的关于在任何地方使用UTF-8,实现必要功能的想法您自己(您也可以对 UTF-16 执行此操作),然后在与 Windows API 通信时将其转换回 UTF-16(并且不知道如何避免 Windows API 中的问题 - 毕竟您仍然给出它UTF-16 字符,因此无论如何都会遇到所有相同的错误),似乎做了很多额外的工作却没有任何好处。
无论如何,您可以让 API 完成其工作,而不是“使用 std::string 然后使用低级方法将其转换为 UTF-16”(请注意,这可能不会带来最佳性能,Ray Chen 有一些关于这种方式的系列回来 - 尽管我希望较新的编译器修复了这个问题,并且对于普通文件来说这并不重要)。
基本上你可以这样做:
为什么你自己做所有的工作(每次我不必使用 MultiByteToWideChar 我认为自己很幸运)如果库也可以这样做 - 也使意图更加清晰。
Never understood the idea declared there about using UTF-8 everywhere, implementing necessary functions yourself (which you could just as well do for UTF-16 as well) and then converting it back to UTF-16 when communicating with the Windows API (and no idea how that should avoid problems in the Windows API - after all you still give it UTF-16 chars and will therefore hit all the same bugs anyhow), seems quite a lot of extra work for no benefits.
Anyways instead of the "use std::string and then convert it using lowlevel methods to UTF-16" you could just let the API do its job (note this may not result in the best performance, Ray Chen had some series about that way back - though I'd hope newer compilers fixed that and for a normal file that's hardly important).
Basically you can do that:
Why do all the work yourself (and every time I don't have to use MultiByteToWideChar I count myself lucky) if the library may do it as well - also makes the intent much clearer.