STL 和 UTF-8 文件输入/输出。怎么做呢?
我使用 wchar_t
作为内部字符串,使用 UTF-8 存储在文件中。我需要使用 STL 将文本输入/输出到屏幕,并且还需要使用完整的立陶宛语字符集。
一切都很好,因为我不必对文件执行相同的操作,因此以下示例可以很好地完成工作:
#include <io.h>
#
include <fcntl.h>
#
include <iostream>
_setmode (_fileno(stdout), _O_U16TEXT);
wcout << L"AaĄąfl" << endl;
But I became curious and attempted to do the same with files with no success. Of course I could use formatted input/output, but that is... discouraged. FILE* fp;
_wfopen_s (&fp, L"utf-8_out_test.txt", L"w");
_setmode (_fileno (fp), _O_U8TEXT);
_fwprintf_p (fp, L"AaĄą\nfl");
fclose (fp);
_wfopen_s (&fp, L"utf-8_in_test.txt", L"r");
_setmode (_fileno (fp), _O_U8TEXT);
wchar_t text[256];
fseek (fp, NULL, SEEK_SET);
fwscanf (fp, L"%s", text);
wcout << text << endl;
fwscanf (fp, L"%s", text);
wcout << text << endl;
fclose (fp);
This snippet works perfectly (although I am not sure how it handles malformed chars). So, is there any way to:- 从
std::basic_*fstream
获取FILE*
或整数文件句柄? - 模拟
_setmode()
就可以了? - 扩展 std::basic_*fstream 以便它可以处理 UTF-8 I/O?
是的,我正在大学学习,这与我的作业有些关系,但我正在努力自己解决这个问题。它不会影响我的成绩或类似的事情。
I use wchar_t
for internal strings and UTF-8 for storage in files. I need to use STL to input/output text to screen and also do it by using full Lithuanian charset.
It's all fine because I'm not forced to do the same for files, so the following example does the job just fine:
#include <io.h>
#
include <fcntl.h>
#
include <iostream>
_setmode (_fileno(stdout), _O_U16TEXT);
wcout << L"AaĄąfl" << endl;
But I became curious and attempted to do the same with files with no success. Of course I could use formatted input/output, but that is... discouraged.
FILE* fp;
_wfopen_s (&fp, L"utf-8_out_test.txt", L"w");
_setmode (_fileno (fp), _O_U8TEXT);
_fwprintf_p (fp, L"AaĄą\nfl");
fclose (fp);
_wfopen_s (&fp, L"utf-8_in_test.txt", L"r");
_setmode (_fileno (fp), _O_U8TEXT);
wchar_t text[256];
fseek (fp, NULL, SEEK_SET);
fwscanf (fp, L"%s", text);
wcout << text << endl;
fwscanf (fp, L"%s", text);
wcout << text << endl;
fclose (fp);
This snippet works perfectly (although I am not sure how it handles malformed chars). So, is there any way to:
- get
FILE*
or integer file handle form astd::basic_*fstream
? - simulate
_setmode ()
on it? - extend
std::basic_*fstream
so it handles UTF-8 I/O?
Yes, I am studying at an university and this is somewhat related to my assignments, but I am trying to figure this out for myself. It won't influence my grade or anything like that.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
好吧,经过一些测试,我发现
FILE
被_iobuf
接受(在w*fstream
构造函数中)。所以,下面的代码可以满足我的需要。This sample reads and writes legit UTF-8 files (without BOM) in Windows compiled with Visual Studio 2k8.
有人可以对可移植性发表评论吗?改进?
Well, after some testing I figured out that
FILE
is accepted for_iobuf
(in thew*fstream
constructor). So, the following code does what I need.This sample reads and writes legit UTF-8 files (without BOM) in Windows compiled with Visual Studio 2k8.
Can someone give any comments about portability? Improvements?
最简单的方法是在尝试输出之前自行转换为 UTF-8。您可能会从这个问题中得到一些启发: UTF8 到/从宽字符转换在STL中
The easiest way would be to do the conversion to UTF-8 yourself before trying to output. You might get some inspiration from this question: UTF8 to/from wide char conversion in STL
在别处回答。
Answered elsewhere.
您无法使 STL 直接使用 UTF-8。根本原因是STL间接禁止多字符字符。每个字符必须是一个 char/wchar_t。
事实上,微软的 UTF-16 编码打破了标准,所以也许你可以从中得到一些灵感。
You can't make STL to directly work with UTF-8. The basic reason is that STL indirectly forbids multi-char characters. Each character has to be one char/wchar_t.
Microsoft actually breaks the standard with their UTF-16 encoding, so maybe you can get some inspiration there.
使用 std::codecvt_facet 模板执行转换。
您可以使用标准 std::codecvt_byname 或非标准codecvt_facet 实现。
请注意,在某些平台上,codecvt_byname 只能针对系统中安装的区域设置发出转换。
Use std::codecvt_facet template to perform the conversion.
You may use standard std::codecvt_byname, or a non-standard codecvt_facet implementation.
Beware that on some platforms codecvt_byname can only emit conversion only for locales that are installed in the system.