使用 c++ 编写文本/二进制文件的最优雅的方法是什么?
我在阅读时发现了一些好的结果,例如:使用迭代器和预分配读取txt,或读入容器。所以我想知道如何将最优雅的 std::string 写入文件?
编辑:阅读时,我可以通过eek和tellg为字符串预分配空间,因为我知道字符串的大小,所以我如何告诉文件系统我想写多少?
I found some good results on reading e.g.: Read txt with iterators and preallocation, or read into containers. So I wondered how would I write most elegant a std::string into a file?
Edit: When reading I can preallocate space for the string via seek and tellg, since I know the size of the string, how could I tell the filesystem how much I want to write?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这是一个关于如何输出
std::string
的小示例,但您应该真正阅读Here's a tiny example on how to output a
std::string
, but you should really read up on fstream您可以使用
operator<<
函数的适当重载将字符串写入std::ofstream
对象。这是一个例子:You can use the appropriate overload of the
operator<<
function to write a string to anstd::ofstream
object. Here's an example:主要问题是这个过程不能完美地逆转。
您可能认为这会起作用,但可能不会。写入文本流时没有正式的分隔符,您必须手动插入它们才能知道一个标记何时结束而另一个标记何时开始。
通常对于字符串,假设它们不包含换行符或制表符,然后在读回时,通常将它们用作分隔符。
写入字符串以便可以读回它的最“完美”方法是先写入其大小,然后写入其内容。即使这样,如果您使用 iostream:
os << str.size() << str;
不会在大小和内容之间放置任何空格,因此如果内容以数字开头,稍后读回时就会遇到麻烦。
操作系统<< str.size() << '\t' << str;
会起作用。
关于读取大集合,使用字符串的最佳选择是使用制表符分隔或行分隔,并在循环中使用 std::getline 。如果任何字符串有空格,istream_iterator 将根本不起作用。
您的替代方法是首先阅读标题部分:
- 字符串数量
- 每个字符串的大小。
然后从一个大缓冲区中读取数据,通过了解要读取的数据数量及其大小,您可以预先分配缓冲区。
写入二进制意味着将原始字节写入文件。这与 C 中的 fwrite 函数类似,只不过您不指定两种大小,而只指定一种大小,即您将写入的字节数。
您需要解决以下问题:
- 如果您不打开二进制流,Windows 将在您写入的每个 ASCII 10 字符前面插入一个 ASCII 13 字符。
- 如果您按字节写入数字,请注意字节序和大小问题(如果它们会被读回)。解决这个问题的最佳方法是将字节序放入输出的标头部分,然后以本机格式写入。假设大多数时候这将是您使用的平台,因此效率更高。
以这种方式写入数字的一大优点不仅是时间效率更高,而且不需要插入任何类型的分隔符,因此读回它们变得相对简单。
缺点是,如果文件中有任何错误,您将需要一个特殊的解释器来读回文件。
无论如何,这些都是问题。
所有这一切的优雅解决方案作为 boost 库的一部分提供 存档并序列化。
您可以以文本或二进制模式写入,它会恢复您存储的方式。它甚至会为你“深入”地写出指针。
The main issue is that the process doesn't reverse perfectly.
You might think that will work but probably won't. There are no formal delimiters when writing to a text stream, you have to manually insert them to know when one token ends and another begins.
Usually with regards to strings, it is assumed that they contain no newline characters or no tab characters, and then when reading back, these are often used as a delimiter.
The most "perfect" way to write a string so you can read it back is to write its size then its content. Even then if you use iostream:
os << str.size() << str;
will not put any space between the size and the content, so if the content begins with a digit you are in trouble when reading it back later.
os << str.size() << '\t' << str;
will work.
With regards to reading big collections, your best bet with strings is to have tab-separated or line-separated and use std::getline in a loop. istream_iterator will simply not work if any of your strings have spaces.
Your alternative is to first read in a header section with:
- number of strings
- size of each string.
Then read in the data just from a big buffer, and by knowing how many you are about to read and their sizes, you can pre-allocate your buffers.
Writing binary means writing raw bytes to the file. This is similar to the fwrite function in C, except you do not specify two sizes, just one size which is the number of bytes you will write.
You need to address issues that:
- Windows will insert an ASCII 13 character for you in front of every ASCII 10 character you write if you do not open the stream for binary.
- If you are writing numbers byte-wise, beware of endian and size issues if they will be read back. The best way to address this is to put the endian-ness in the header section of your output and then write in native format. The assumption is that most of the time this will be the platform you use so it is more efficient.
The big plus-side of writing numbers this way is not only is it more efficient in time but there is also no need to insert any kind of delimiter so reading them back becomes relatively simple.
The downside is that you will need a special interpreter to read the file back if there are any errors in it.
Anyway, these are the issues.
The elegant solution to all this is provided as part of the boost library with archive and serialize.
You can write in text or binary mode and it will restore the way you stored it. It will even write pointers "deeply" for you.