在 Qt 中创建 UTF-8 文件
我正在尝试在 Qt 中创建 UTF-8 编码文件。
#include <QtCore>
int main()
{
QString unicodeString = "Some Unicode string";
QFile fileOut("D:\\Temp\\qt_unicode.txt");
if (!fileOut.open(QIODevice::WriteOnly | QIODevice::Text))
{
return -1;
}
QTextStream streamFileOut(&fileOut);
streamFileOut.setCodec("UTF-8");
streamFileOut << unicodeString;
streamFileOut.flush();
fileOut.close();
return 0;
}
我认为当 QString 默认为 Unicode 并且当我将输出流的编解码器设置为 UTF-8 时,我的文件将是 UTF-8。但事实并非如此,它是 ANSI。 我做错了什么?我的琴弦有问题吗?你能更正我的代码以创建 UTF-8 文件吗? 我的下一步是读取 ANSI 文件并将其保存为 UTF-8 文件,因此我必须对每个读取的字符串执行转换,但现在我想从一个文件开始。 谢谢。
I'm trying to create a UTF-8 coded file in Qt.
#include <QtCore>
int main()
{
QString unicodeString = "Some Unicode string";
QFile fileOut("D:\\Temp\\qt_unicode.txt");
if (!fileOut.open(QIODevice::WriteOnly | QIODevice::Text))
{
return -1;
}
QTextStream streamFileOut(&fileOut);
streamFileOut.setCodec("UTF-8");
streamFileOut << unicodeString;
streamFileOut.flush();
fileOut.close();
return 0;
}
I thought when QString is by default Unicode and when I set codec of the output stream to UTF-8 that my file will be UTF-8. But it's not, it's ANSI.
What do I do wrong? Is something wrong with my strings? Can you correct my code to create UTF-8 file?
Next step for me will be to read ANSI file and save it as UTF-8 file, so I'll have to perform a conversion on each read string but now, I want to start with a file.
Thank you.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
2022 编辑:以下内容适用于 Qt 4。Qt 5 及更高版本默认使用 UTF-8,因此此答案不适用于最新的 Qt 版本。
你的代码是绝对正确的。对我来说唯一可疑的部分是:
它看起来可疑的原因是 QString 在从 C 样式字符串文字构造时默认使用 Latin1 编码,因此如果您只想使用带重音的拉丁字符,您可能会很好,但使用除此之外的任何内容(西里尔文、中文、日文、希伯来文......),它就不再正常工作。处理此问题的最佳方法是将源代码编码为 UTF-8,然后执行此操作:
这适用于任何可以想象的语言。使用 QObject::trUtf8() 甚至更好,因为它为您提供了很多 i18n 功能。
编辑
虽然您确实生成了正确的 UTF-8 文件,但如果您希望记事本将您的文件识别为 UTF-8,那就是另一回事了。您需要在其中放入 BOM。可以按照另一个答案中的建议来完成,或者这是另一种方式:
2022 edit: what follows was true for Qt 4. Qt 5 and later use UTF-8 by default, so this answer doesn’t apply to the latest Qt versions.
Your code is absolutely correct. The only part that looks suspicious to me is this:
The reason it looks suspicious is that QString uses the Latin1 encoding by default when constructing from a C-style string literal, so if you just intend to use accented Latin characters, you're probably fine, but use anything but that (Cyrillic, Chinese, Japanese, Hebrew...) and it no longer works correctly. The best way to deal with this issue is to have your source encoded in UTF-8 and do this instead:
This will work for any imaginable language. Using QObject::trUtf8() is even better as it gives you a lot of i18n capabilities.
Edit
While it's true that you generate a correct UTF-8 file, if you want Notepad to recognize your file as UTF-8, it's a different story. You need to put a BOM in there. It can be done either as suggested in another answer, or here is another way:
我通过 QT 创建 txt 编码无 BOM 的 UTF-8 的经验为:
并且该文件将编码无 BOM 的 UTF-8。
My experience to create txt encoding UTF-8 without BOM by QT as:
And the file will be encoding UTF-8 without BOM.
不要忘记 UTF-8 编码会将 ASCII 字符编码为一个字节。只有特殊或重音字符才会使用更多字节(从 2 到 6 个字节)进行编码。
这意味着只要您有 ASCII 字符(
unicodeString
就是这种情况),该文件将仅包含 8 个字节的字符。因此,您可以获得与 ASCII 的向后兼容性:。要检查代码是否正常工作,您应该在 unicode 中放入一些重音字符。
我用重音字符测试了你的代码,它工作正常。
如果您希望在文件开头有 BOM,您可以先添加 BOM字符 (
QChar(QChar::ByteOrderMark)
)。Don't forget that UTF-8 encoding will encode ASCII characters as one byte. Only special or accentuated characters will be encoded with more bytes (from 2 to 6 bytes).
This means as long as you have ASCII characters (which is the case of your
unicodeString
), the file will only contain 8 bytes characters. Thus, you get backward compatibility with ASCII :To check if your code is working, you should put for instance some accentuated characters in your unicode.
I tested your code with accentuated characters, and it's working fine.
If you want to have a BOM at the beginning of your file, you could start by adding the BOM character (
QChar(QChar::ByteOrderMark)
).