无论编码如何,从 Wifstream 中提取正确的文本
程序如下:http://codepad.org/eyxunHot
文件的编码是UTF -8。
我有一个名为“config.ini”的文本文件,其中包含以下单词: ➑ball
如果我使用记事本以“UTF-8”编码保存文件,然后运行程序,根据调试器,八球的值为: 如果我用记事本
以“Unicode”编码保存文件,然后运行程序,根据调试器,八球的值为: ÿþ'b
如果我使用记事本以“Unicode big endian”编码保存文件,然后运行程序,根据调试器,八球的值为: þÿ'
在所有这些情况下,结果都是不正确的。另外 ANSI 编码不支持 ➑ 符号。当我进入 config_file >> 时,如何确保单词 ➑ball 将从文件中提取出来?八球,无论编码如何?我希望该程序的输出是“程序正确”,无论 config.ini 的编码如何。
Here is the program: http://codepad.org/eyxunHot
The encoding of the file is UTF-8.
I have a text file named "config.ini" with the following word in it:
➑ball
If I use notepad to save the file with "UTF-8" encoding, then run the program, according to the debugger the value of eight_ball is:
âball
If I use notepad to save the file with "Unicode" encoding, then run the program, according to the debugger the value of eight_ball is:
ÿþ'b
If I use notepad to save the file with "Unicode big endian" encoding, then run the program, according to the debugger the value of eight_ball is:
þÿ'
In all these cases the result is incorrect. Also ANSI encoding doesn't support the ➑ symbol. How do I make sure that the word ➑ball will be extracted from the file when I go config_file >> eight_ball, regardless of encoding? I want the output of this program to be "Program is correct" regardless of the encoding of config.ini.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果您在 Windows 下并且想要使用 INI 文件,请记住,INI API 支持 Unicode(UTF-16 小端)INI 文件,没有任何问题,您只需在开头提供带有 BOM 的空文件即可。
顺便说一句,如果您想使用 C++ 流和 Unicode,您可能需要查看 这篇文章。除了 UTF8 之外,您还将了解 C++ 流中字符转换的原理。
If you're under Windows and you want to use INI files, keep in mind that the INI APIs support Unicode (UTF-16 little endian) INI files without problems, you just have to provide the empty file with the BOM at the beginning.
By the way, if you want to work with C++ streams and Unicode you may want to look at this article. Besides of the UTF8 thing, you'll learn also how character conversion works under the hood in C++ streams.
也许你可以使用 ICU 库。
Windows 在 UTF 支持方面存在许多问题。我的 Ubuntu 使用默认的 UTF-8 编码,这个问题已经解决,但是类 Unix 操作系统有一些奇怪的 C++ 标准库实现。我的意思是使用 char* 来保存 UTF-8 文本(它在字母上使用 2 个数组单元格)。但使用字符串类它会清理。
Maybe you can yse ICU library.
Windows has many problems with UTF supports. My Ubuntu uses default UTF-8 encodings and this problem solved, but Unix like OS has some strange realization of C++ standart library. I mean using char* for holding UTF-8 text (it use 2 cells of array on letter). But with string class it cleans.
您需要先设置区域设置,wstreams 才能正常工作。相反,我建议使用常规流和一些库进行字符转换,因为您的输入编码通常会有所不同。目前最好的算法是首先尝试读取为 UTF-8,如果失败,请尝试读取为 CP1252 或其他一些用户可配置的旧字符集。
You need to set the locale before wstreams will work correctly. I would instead suggest using regular streams and some library for character conversion, as your input encoding typically will differ anyway. The best algorithm these days is to try reading as UTF-8 first and if that fails, try reading as CP1252 or some other user-configurable legacy charset.