在 c++ 中读取和写入西里尔文文件
我必须首先读取西里尔文文件,然后随机选择随机行数并将修改后的文本写入不同的文件。拉丁字母没有问题,但我遇到了西里尔文字的问题,因为我得到了一些垃圾。这就是我尝试做这件事的方式。
文件 input.txt
ааааааа
ббббббб
ввввввв
比如说,我必须读取
vector<wstring> inputVector;
wstring inputString, result;
wifstream inputStream;
inputStream.open("input.txt");
while(!inputStream.eof())
{
getline(inputStream, inputString);
inputVector.push_back(inputString);
}
inputStream.close();
srand(time(NULL));
int numLines = rand() % inputVector.size();
for(int i = 0; i < numLines; i++)
{
int randomLine = rand() % inputVector.size();
result += inputVector[randomLine];
}
wofstream resultStream;
resultStream.open("result.txt");
resultStream << result;
resultStream.close();
,并将每一行放入向量中:那么我该如何使用西里尔字母,以便它生成可读的内容,而不仅仅是符号?
I have to first read a file in Cyrillic, then randomly pick random number of lines and write modified text to a different file. No problem with Latin letter, but I run into a problem with Cyrillic text, because I get some rubbish. So this is how I tried to do the thing.
Say, file input.txt
is
ааааааа
ббббббб
ввввввв
I have to read it, and put every line into a vector:
vector<wstring> inputVector;
wstring inputString, result;
wifstream inputStream;
inputStream.open("input.txt");
while(!inputStream.eof())
{
getline(inputStream, inputString);
inputVector.push_back(inputString);
}
inputStream.close();
srand(time(NULL));
int numLines = rand() % inputVector.size();
for(int i = 0; i < numLines; i++)
{
int randomLine = rand() % inputVector.size();
result += inputVector[randomLine];
}
wofstream resultStream;
resultStream.open("result.txt");
resultStream << result;
resultStream.close();
So how can I do work with Cyrillic so it produces readable things, not just symbols?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
因为您看到类似的内容 ■aaaaaaa 1 1 1 1 1 1 1 2 2 2 2 2 2 2 ♦ 打印到控制台,看起来
input.txt< /code> 以 UTF-16 编码进行编码,可能是 UTF-16 LE + 物料清单。如果将文件的编码更改为 UTF-8,则可以使用原始代码。
使用UTF-8的原因是,无论文件流的char类型如何,
basic_fstream
的底层basic_filebuf
都使用codecvt
对象将char
对象流与 char 类型对象流相互转换;即读取时,从文件中读取的char
流会转换为wchar_t
流,但写入时,会转换为wchar_t
流到一个char
流,然后写入文件。对于std::wifstream
,codecvt
对象是标准std::codecvt
的实例,通常将 UTF-8 转换为 UCS-16。正如
的 MSDN 文档页面中所述basic_filebuf
:同样,当读取 Unicode 字符串(包含
wchar_t
字符)时,basic_filebuf
将从文件中读取的 ANSI 字符串转换为返回到的wchar_t
字符串getline
和其他读取操作。如果您将
input.txt
的编码更改为 UTF-8,您的原始程序应该可以正常运行。作为参考,这对我有用:
请注意,
result.txt
的编码也将是 UTF-8(通常)。Because you saw something like ■a a a a a a a 1♦1♦1♦1♦1♦1♦1♦ 2♦2♦2♦2♦2♦2♦2♦ printed to the console, it appears that
input.txt
is encoded in a UTF-16 encoding, probably UTF-16 LE + BOM. You can use your original code if you change the encoding of the file to UTF-8.The reason for using UTF-8 is that, regardless of the char type of the file stream,
basic_fstream
's underlyingbasic_filebuf
uses acodecvt
object to convert a stream ofchar
objects to/from a stream of objects of the char type; i.e. when reading, thechar
stream that is read from the file is converted to awchar_t
stream, but when writing, awchar_t
stream is converted to achar
stream that is then written to the file. In the case ofstd::wifstream
, thecodecvt
object is an instance of the standardstd::codecvt<wchar_t, char, mbstate_t>
, which generally converts UTF-8 to UCS-16.As explained on the MSDN documentation page for
basic_filebuf
:Similarly, when reading a Unicode string (containing
wchar_t
characters), thebasic_filebuf
converts the ANSI string read from the file to thewchar_t
string returned togetline
and other read operations.If you change the encoding of
input.txt
to UTF-8, your original program should work correctly.For reference, this works for me:
Note that the encoding of
result.txt
will also be UTF-8 (generally).为什么要使用
wifstream
- 您是否确信您的文件由一系列(取决于系统)宽字符组成?几乎可以肯定事实并非如此。 (最值得注意的是,系统的宽字符集在 C++ 程序范围之外实际上并不是确定的)。相反,只需按原样读取输入字节流并相应地回显它:
Why would you use
wifstream
-- are you confident that your file consists of a sequence of (system-dependent) wide characters? Almost certainly that is not the case. (Most notably because the system's wide character set isn't actually definite outside the scope of a C++ program).Instead, just read the input byte stream as it is and echo it accordingly: