STL 和 UTF-8 文件输入/输出。怎么做呢?

发布于 2024-09-28 16:17:49 字数 1379 浏览 0 评论 0原文

我使用 wchar_t 作为内部字符串,使用 UTF-8 存储在文件中。我需要使用 STL 将文本输入/输出到屏幕,并且还需要使用完整的立陶宛语字符集。
一切都很好,因为我不必对文件执行相同的操作,因此以下示例可以很好地完成工作:

#include <io.h>
#include <fcntl.h>
#include <iostream>
    _setmode (_fileno(stdout), _O_U16TEXT);
    wcout << L"AaĄąfl" << endl;
But I became curious and attempted to do the same with files with no success. Of course I could use formatted input/output, but that is... discouraged.
    FILE* fp;
    _wfopen_s (&fp, L"utf-8_out_test.txt", L"w");
    _setmode (_fileno (fp), _O_U8TEXT);
    _fwprintf_p (fp, L"AaĄą\nfl");
    fclose (fp);
    _wfopen_s (&fp, L"utf-8_in_test.txt", L"r");
    _setmode (_fileno (fp), _O_U8TEXT);
    wchar_t text[256];
    fseek (fp, NULL, SEEK_SET);
    fwscanf (fp, L"%s", text);
    wcout << text << endl;
    fwscanf (fp, L"%s", text);
    wcout << text << endl;
    fclose (fp);
This snippet works perfectly (although I am not sure how it handles malformed chars). So, is there any way to:

  • std::basic_*fstream 获取 FILE* 或整数文件句柄?
  • 模拟_setmode()就可以了?
  • 扩展 std::basic_*fstream 以便它可以处理 UTF-8 I/O?

是的,我正在大学学习,这与我的作业有些关系,但我正在努力自己解决这个问题。它不会影响我的成绩或类似的事情。

I use wchar_t for internal strings and UTF-8 for storage in files. I need to use STL to input/output text to screen and also do it by using full Lithuanian charset.
It's all fine because I'm not forced to do the same for files, so the following example does the job just fine:

#include <io.h>
#include <fcntl.h>
#include <iostream>
    _setmode (_fileno(stdout), _O_U16TEXT);
    wcout << L"AaĄąfl" << endl;

But I became curious and attempted to do the same with files with no success. Of course I could use formatted input/output, but that is... discouraged.

    FILE* fp;
    _wfopen_s (&fp, L"utf-8_out_test.txt", L"w");
    _setmode (_fileno (fp), _O_U8TEXT);
    _fwprintf_p (fp, L"AaĄą\nfl");
    fclose (fp);
    _wfopen_s (&fp, L"utf-8_in_test.txt", L"r");
    _setmode (_fileno (fp), _O_U8TEXT);
    wchar_t text[256];
    fseek (fp, NULL, SEEK_SET);
    fwscanf (fp, L"%s", text);
    wcout << text << endl;
    fwscanf (fp, L"%s", text);
    wcout << text << endl;
    fclose (fp);

This snippet works perfectly (although I am not sure how it handles malformed chars). So, is there any way to:

  • get FILE* or integer file handle form a std::basic_*fstream?
  • simulate _setmode () on it?
  • extend std::basic_*fstream so it handles UTF-8 I/O?

Yes, I am studying at an university and this is somewhat related to my assignments, but I am trying to figure this out for myself. It won't influence my grade or anything like that.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

蝶舞 2024-10-05 16:17:50

好吧,经过一些测试,我发现 FILE_iobuf 接受(在 w*fstream 构造函数中)。所以,下面的代码可以满足我的需要。

#include <iostream>
#include <fstream>
#include <io.h>
#include <fcntl.h>
//For writing
    FILE* fp;
    _wfopen_s (&fp, L"utf-8_out_test.txt", L"w");
    _setmode (_fileno (fp), _O_U8TEXT);
    wofstream fs (fp);
    fs << L"ąfl";
    fclose (fp);
//And reading
    FILE* fp;
    _wfopen_s (&fp, L"utf-8_in_test.txt", L"r");
    _setmode (_fileno (fp), _O_U8TEXT);
    wifstream fs (fp);
    wchar_t array[6];
    fs.getline (array, 5);
    wcout << array << endl;//For debug
    fclose (fp);

This sample reads and writes legit UTF-8 files (without BOM) in Windows compiled with Visual Studio 2k8.

有人可以对可移植性发表评论吗?改进?

Well, after some testing I figured out that FILE is accepted for _iobuf (in the w*fstream constructor). So, the following code does what I need.

#include <iostream>
#include <fstream>
#include <io.h>
#include <fcntl.h>
//For writing
    FILE* fp;
    _wfopen_s (&fp, L"utf-8_out_test.txt", L"w");
    _setmode (_fileno (fp), _O_U8TEXT);
    wofstream fs (fp);
    fs << L"ąfl";
    fclose (fp);
//And reading
    FILE* fp;
    _wfopen_s (&fp, L"utf-8_in_test.txt", L"r");
    _setmode (_fileno (fp), _O_U8TEXT);
    wifstream fs (fp);
    wchar_t array[6];
    fs.getline (array, 5);
    wcout << array << endl;//For debug
    fclose (fp);

This sample reads and writes legit UTF-8 files (without BOM) in Windows compiled with Visual Studio 2k8.

Can someone give any comments about portability? Improvements?

生活了然无味 2024-10-05 16:17:50

最简单的方法是在尝试输出之前自行转换为 UTF-8。您可能会从这个问题中得到一些启发: UTF8 到/从宽字符转换在STL中

The easiest way would be to do the conversion to UTF-8 yourself before trying to output. You might get some inspiration from this question: UTF8 to/from wide char conversion in STL

情泪▽动烟 2024-10-05 16:17:50

从 std::basic_*fstream 获取 FILE* 或整数文件句柄?

在别处回答。

get FILE* or integer file handle form a std::basic_*fstream?

Answered elsewhere.

紫南 2024-10-05 16:17:50

您无法使 STL 直接使用 UTF-8。根本原因是STL间接禁止多字符字符。每个字符必须是一个 char/wchar_t。

事实上,微软的 UTF-16 编码打破了标准,所以也许你可以从中得到一些灵感。

You can't make STL to directly work with UTF-8. The basic reason is that STL indirectly forbids multi-char characters. Each character has to be one char/wchar_t.

Microsoft actually breaks the standard with their UTF-16 encoding, so maybe you can get some inspiration there.

江南烟雨〆相思醉 2024-10-05 16:17:49

使用 std::codecvt_facet 模板执行转换。

您可以使用标准 std::codecvt_byname 或非标准codecvt_facet 实现

#include <locale>
using namespace std;
typedef codecvt_facet<wchar_t, char, mbstate_t> Cvt;
locale utf8locale(locale(), new codecvt_byname<wchar_t, char, mbstate_t> ("en_US.UTF-8"));
wcout.pubimbue(utf8locale);
wcout << L"Hello, wide to multybyte world!" << endl;

请注意,在某些平台上,codecvt_byname 只能针对系统中安装的区域设置发出转换。

Use std::codecvt_facet template to perform the conversion.

You may use standard std::codecvt_byname, or a non-standard codecvt_facet implementation.

#include <locale>
using namespace std;
typedef codecvt_facet<wchar_t, char, mbstate_t> Cvt;
locale utf8locale(locale(), new codecvt_byname<wchar_t, char, mbstate_t> ("en_US.UTF-8"));
wcout.pubimbue(utf8locale);
wcout << L"Hello, wide to multybyte world!" << endl;

Beware that on some platforms codecvt_byname can only emit conversion only for locales that are installed in the system.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文