c++ 期间的 UTF8 加扰文件加载

发布于 2024-11-29 09:31:15 字数 791 浏览 1 评论 0原文

我知道加载 unicode 是一个有点费力的点,但我不知道如何将向其他人提供的解决方案应用于我的特定问题。

我有一个 Win7/C++/DirectX9 GUI 库,可以将文本渲染到屏幕上。我以前从未遇到过问题,因为它只用于西欧语言。现在我必须将它与匈牙利语一起使用,这让我很头疼!我的特殊问题是加载该语言中找到的特殊字符。

举个例子,FELNŐTTEKNEK,意思是成人。

如果我将此字符串硬编码到我的应用程序中,它会正确渲染:

guiTitle->SetText( L"FELNŐTTEKNEK" );

这会将字符串存储为 std::wstring,并使用 ID3DXFont::DrawTextW() 渲染它。这也证明了我选择的字体 Futura CE 能够渲染特殊字符(CE = 中欧)。

到目前为止,一切都很好。接下来我只是希望能够从文本文件加载文本。没什么大不了的。然而结果却很糟糕!特殊的 Ő 被另一个字符替换,主要是 Å 或什至像 Å 这样的两个字符(第二个通常无法打印)

我已确保输入文本文件被编码为 UTF-8 并且我天真地尝试加载它:

wifstream f("data/language.ini");
wstring w;  
getline( f, w );    
guiTitle->SetText( w );

不知怎的,我仍然在加扰它。我加载为 UTF-8 吗?有没有办法保证这一点?我只需要确保我有一个宽字符串,其中包含文本编辑器中显示的文本。

非常感谢收到的任何帮助。

I know loading unicode is a somewhat laboured point, but I can't see how to apply the solutions presented to others to my particular problem.

I have a Win7/C++/DirectX9 GUI library which can render text to the screen. I've never had a problem before since it has only be used with Western European language. Now I have to use it with Hungarian, and it is giving me a headache! My particular problem is with loading the special characters found in that language.

Take this example, FELNŐTTEKNEK, meaning ADULT.

If I hard code this string into my app, it renders correctly:

guiTitle->SetText( L"FELNŐTTEKNEK" );

This stores the string as a std::wstring, rendering it with ID3DXFont::DrawTextW(). It also proves my chosen font, Futura CE, is able to render the special characters (CE = Central European).

So far so good. Next I simply want to be able to load the text from a text file. No big deal. However the results are bad! The special Ő is replaced by another character, mainly Å or even two characters like Å (2nd one usually unprintable)

I have ensured by input text file is encoded as UTF-8 and am naively trying to load it thus:

wifstream f("data/language.ini");
wstring w;  
getline( f, w );    
guiTitle->SetText( w );

Somehow I am still scrambling it. Am I loading as UTF-8? Is there a way to ensure this? I just need to ensure I have a wide string with the text as show in text editor.

Any assistance most gratefully received.

Si

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

压抑⊿情绪 2024-12-06 09:31:15

忘记wifstream吧,让它工作太难了。做法:

ifstream f(L"data/language.ini");
string str;  
getline( f, str );
guiTitle->SetText( utf8_to_utf16(str).c_str() );

并使用 MultiByteToWideChar 实现 utf8_to_utf16

另请参阅https://stackoverflow.com/questions/1049947/should-utf-16-被认为是有害的

Forget about wifstream, it's just too hard to make it work. Do:

ifstream f(L"data/language.ini");
string str;  
getline( f, str );
guiTitle->SetText( utf8_to_utf16(str).c_str() );

And use MultiByteToWideChar to implement utf8_to_utf16.

See also https://stackoverflow.com/questions/1049947/should-utf-16-be-considered-harmful.

零時差 2024-12-06 09:31:15

DrawTextW 需要 UTF-16。

您正在做的是将每个 UTF-8 代码单元(字节)通过零填充转换为 16 位值 - 仅当您的 UTF-8 专门包含来自 ascii 子集的字符时,这才能正确地将 UTF-8 转换为 UTF-16统一码。

您需要做的就是正确从 UTF-8 转换为 UTF-16。将字符串加载到 std::string (不是 std::wstring)中,然后将该 UTF-8 字符串转换为 UTF-16 字符串,并将其传递给需要 UTF-16 字符串的 API。

DrawTextW is expecting UTF-16.

What you're doing is converting each UTF-8 code unit (byte) into a 16 bit value by zero padding it - this correctly converts UTF-8 to UTF-16 only if your UTF-8 exclusively contains characters from the ascii subset of unicode.

What you need to do is to correctly convert from UTF-8 to UTF-16. Load the string into a std::string (not a std::wstring) then convert that UTF-8 string into a UTF-16 string and pass it to the API expecting a UTF-16 string.

瑶笙 2024-12-06 09:31:15

从来没有理解那里声明的关于在任何地方使用UTF-8,实现必要功能的想法您自己(您也可以对 UTF-16 执行此操作),然后在与 Windows API 通信时将其转换回 UTF-16(并且不知道如何避免 Windows API 中的问题 - 毕竟您仍然给出它UTF-16 字符,因此无论如何都会遇到所有相同的错误),似乎做了很多额外的工作却没有任何好处。

无论如何,您可以让 API 完成其工作,而不是“使用 std::string 然后使用低级方法将其转换为 UTF-16”(请注意,这可能不会带来最佳性能,Ray Chen 有一些关于这种方式的系列回来 - 尽管我希望较新的编译器修复了这个问题,并且对于普通文件来说这并不重要)。

基本上你可以这样做:

 std::wifstream src;
 src.imbue(std::locale("UTF-8")); // use correct encoding.
 src.open(file);

为什么你自己做所有的工作(每次我不必使用 MultiByteToWideChar 我认为自己很幸运)如果库也可以这样做 - 也使意图更加清晰。

Never understood the idea declared there about using UTF-8 everywhere, implementing necessary functions yourself (which you could just as well do for UTF-16 as well) and then converting it back to UTF-16 when communicating with the Windows API (and no idea how that should avoid problems in the Windows API - after all you still give it UTF-16 chars and will therefore hit all the same bugs anyhow), seems quite a lot of extra work for no benefits.

Anyways instead of the "use std::string and then convert it using lowlevel methods to UTF-16" you could just let the API do its job (note this may not result in the best performance, Ray Chen had some series about that way back - though I'd hope newer compilers fixed that and for a normal file that's hardly important).

Basically you can do that:

 std::wifstream src;
 src.imbue(std::locale("UTF-8")); // use correct encoding.
 src.open(file);

Why do all the work yourself (and every time I don't have to use MultiByteToWideChar I count myself lucky) if the library may do it as well - also makes the intent much clearer.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文