c++ 期间的 UTF8 加扰文件加载

发布于 2024-11-29 09:31:15 字数 791 浏览 1 评论 0原文

我知道加载 unicode 是一个有点费力的点，但我不知道如何将向其他人提供的解决方案应用于我的特定问题。

我有一个 Win7/C++/DirectX9 GUI 库，可以将文本渲染到屏幕上。我以前从未遇到过问题，因为它只用于西欧语言。现在我必须将它与匈牙利语一起使用，这让我很头疼！我的特殊问题是加载该语言中找到的特殊字符。

举个例子，FELNŐTTEKNEK，意思是成人。

如果我将此字符串硬编码到我的应用程序中，它会正确渲染：

guiTitle->SetText( L"FELNŐTTEKNEK" );

这会将字符串存储为 std::wstring，并使用 ID3DXFont::DrawTextW() 渲染它。这也证明了我选择的字体 Futura CE 能够渲染特殊字符（CE = 中欧）。

到目前为止，一切都很好。接下来我只是希望能够从文本文件加载文本。没什么大不了的。然而结果却很糟糕！特殊的 Ő 被另一个字符替换，主要是 Å 或什至像 Å 这样的两个字符（第二个通常无法打印）

我已确保输入文本文件被编码为 UTF-8 并且我天真地尝试加载它：

wifstream f("data/language.ini");
wstring w;  
getline( f, w );    
guiTitle->SetText( w );

不知怎的，我仍然在加扰它。我加载为 UTF-8 吗？有没有办法保证这一点？我只需要确保我有一个宽字符串，其中包含文本编辑器中显示的文本。

非常感谢收到的任何帮助。

硅

原文

I know loading unicode is a somewhat laboured point, but I can't see how to apply the solutions presented to others to my particular problem.

I have a Win7/C++/DirectX9 GUI library which can render text to the screen. I've never had a problem before since it has only be used with Western European language. Now I have to use it with Hungarian, and it is giving me a headache! My particular problem is with loading the special characters found in that language.

Take this example, FELNŐTTEKNEK, meaning ADULT.

If I hard code this string into my app, it renders correctly:

guiTitle->SetText( L"FELNŐTTEKNEK" );

This stores the string as a std::wstring, rendering it with ID3DXFont::DrawTextW(). It also proves my chosen font, Futura CE, is able to render the special characters (CE = Central European).

So far so good. Next I simply want to be able to load the text from a text file. No big deal. However the results are bad! The special Ő is replaced by another character, mainly Å or even two characters like Å (2nd one usually unprintable)

I have ensured by input text file is encoded as UTF-8 and am naively trying to load it thus:

wifstream f("data/language.ini");
wstring w;  
getline( f, w );    
guiTitle->SetText( w );

Somehow I am still scrambling it. Am I loading as UTF-8? Is there a way to ensure this? I just need to ensure I have a wide string with the text as show in text editor.

Any assistance most gratefully received.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

压抑⊿情绪 2024-12-06 09:31:15

忘记wifstream吧，让它工作太难了。做法：

ifstream f(L"data/language.ini");
string str;  
getline( f, str );
guiTitle->SetText( utf8_to_utf16(str).c_str() );

并使用 MultiByteToWideChar 实现 utf8_to_utf16。

另请参阅https://stackoverflow.com/questions/1049947/should-utf-16-被认为是有害的。

Forget about wifstream, it's just too hard to make it work. Do:

ifstream f(L"data/language.ini");
string str;  
getline( f, str );
guiTitle->SetText( utf8_to_utf16(str).c_str() );

And use MultiByteToWideChar to implement utf8_to_utf16.

回复收藏 0 原文

零時差 2024-12-06 09:31:15

DrawTextW 需要 UTF-16。

您正在做的是将每个 UTF-8 代码单元（字节）通过零填充转换为 16 位值 - 仅当您的 UTF-8 专门包含来自 ascii 子集的字符时，这才能正确地将 UTF-8 转换为 UTF-16统一码。

您需要做的就是正确从 UTF-8 转换为 UTF-16。将字符串加载到 std::string （不是 std::wstring）中，然后将该 UTF-8 字符串转换为 UTF-16 字符串，并将其传递给需要 UTF-16 字符串的 API。

回复收藏 0 原文

瑶笙 2024-12-06 09:31:15

从来没有理解那里声明的关于在任何地方使用UTF-8，实现必要功能的想法您自己（您也可以对 UTF-16 执行此操作），然后在与 Windows API 通信时将其转换回 UTF-16（并且不知道如何避免 Windows API 中的问题 - 毕竟您仍然给出它UTF-16 字符，因此无论如何都会遇到所有相同的错误），似乎做了很多额外的工作却没有任何好处。

无论如何，您可以让 API 完成其工作，而不是“使用 std::string 然后使用低级方法将其转换为 UTF-16”（请注意，这可能不会带来最佳性能，Ray Chen 有一些关于这种方式的系列回来 - 尽管我希望较新的编译器修复了这个问题，并且对于普通文件来说这并不重要）。

基本上你可以这样做：

 std::wifstream src;
 src.imbue(std::locale("UTF-8")); // use correct encoding.
 src.open(file);

为什么你自己做所有的工作（每次我不必使用 MultiByteToWideChar 我认为自己很幸运）如果库也可以这样做 - 也使意图更加清晰。

Never understood the idea declared there about using UTF-8 everywhere, implementing necessary functions yourself (which you could just as well do for UTF-16 as well) and then converting it back to UTF-16 when communicating with the Windows API (and no idea how that should avoid problems in the Windows API - after all you still give it UTF-16 chars and will therefore hit all the same bugs anyhow), seems quite a lot of extra work for no benefits.

Anyways instead of the "use std::string and then convert it using lowlevel methods to UTF-16" you could just let the API do its job (note this may not result in the best performance, Ray Chen had some series about that way back - though I'd hope newer compilers fixed that and for a normal file that's hardly important).

Basically you can do that:

 std::wifstream src;
 src.imbue(std::locale("UTF-8")); // use correct encoding.
 src.open(file);

Why do all the work yourself (and every time I don't have to use MultiByteToWideChar I count myself lucky) if the library may do it as well - also makes the intent much clearer.

回复收藏 0 原文

~没有更多了~