ostream_iterator 将数字数据写入文件的性能?
我有各种带有数字数据的 std::vector 实例,主要是 int16_t、int32_t 等。我想以尽可能快的方式将这些数据转储到文件中。如果我使用 ostream_iterator,它会在一次操作中写入整个内存块,还是会迭代向量的元素,为每个元素发出一个写入操作?
I've got various std::vector instances with numeric data in them, primarily int16_t, int32_t, etc. I'd like to dump this data to a file in as fast a manner as possible. If I use an ostream_iterator, will it write the entire block of memory in a single operation, or will it iterate over the elements of the vector, issuing a write operation for each one?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
在我熟悉的任何实现中,流迭代器和向量肯定不会使用块副本。例如,如果向量项类型是类而不是 POD,则直接复制将是一件坏事。我怀疑 ostream 也会格式化输出,而不是直接写入值(即,ascii 而不是二进制输出)。
您可能会更好地使用
boost::copy
,因为它专门针对在可能的情况下进行块写入进行了优化,但最实用的解决方案是直接使用&v[ 对向量内存进行操作0]
。A stream iterator and a vector will definitely not use a block copy in any implementation I'm familiar with. If the vector item type was a class rather than POD, for example, a direct copy would be a bad thing. I suspect the ostream will format the output as well, rather than writing the values directly (i.e., ascii instead of binary output).
You might have better luck with
boost::copy
, as it's specifically optimized to do block writes when possible, but the most practical solution is to operate on the vector memory directly using&v[0]
.我所知道的大多数
ofstream
实现都会缓冲数据,因此您可能最终不会进行过多的写入。在实际写入完成之前,ofstream()
中的缓冲区必须填满,大多数操作系统的缓冲区文件数据也位于此之下。从 C++ 应用程序级别来看,它们之间的相互作用根本不透明;缓冲区大小等的选择由实现决定。C++ 确实提供了一种向
ostream
的 streambuf。您可以尝试像这样调用pubsetbuf
:缺点是这不一定能起到任何作用。有些实现只是忽略它。
如果您想缓冲数据并仍然使用
ostream_iterator
,则另一个选择是使用ostringstream
,例如:然后,一旦所有数据都被缓冲,您就可以写入整个数据使用一个大的
ostream::write()
、POSIX I/O 等来缓冲。不过,这仍然可能很慢,因为您正在执行格式化输出,并且必须有两个副本内存中的数据一次:原始数据和格式化的缓冲数据。如果您的应用程序已经突破了内存限制,那么这不是最好的方法,并且您最好使用
ofstream
为您提供的内置缓冲。最后,如果您确实想要性能,最快的方法是使用
ostream::write()
将原始内存转储到磁盘,如 Neil 建议,或者使用操作系统的 I/O 函数。这里的缺点是您的数据没有格式化,您的文件可能不是人类可读的,并且在与您写入的字节序不同的体系结构上不容易读取它。但它会将您的数据快速保存到磁盘,并且不会增加应用程序的内存要求。Most
ofstream
implementations I know of do buffer data, so you probably will not end up doing an excessive number of writes. The buffer in theofstream()
has to fill up before an actual write is done, and most OS's buffer file data underneath this, too. The interplay of these is not at all transparent from the C++ application level; selection of buffer sizes, etc. is left up to the implementation.C++ does provide a way to supply your own buffer to an
ostream
's streambuf. You can try callingpubsetbuf
like this:The downside is that this doesn't necessarily do anything. Some implementations just ignore it.
The other option you have if you want to buffer things and still use
ostream_iterator
is to use anostringstream
, e.g.:Then once all your data is buffered, you can write the entire buffer using one big
ostream::write()
, POSIX I/O, etc.This can still be slow, though, since you're doing formatted output, and you have to have two copies of your data in memory at once: the raw data and the formatted, buffered data. If your application pushes the limits of memory already, this isn't the greatest way to go, and you're probably better off using the built-in buffering that
ofstream
gives you.Finally, if you really want performance, the fastest way to do this is to dump the raw memory to disk using
ostream::write()
as Neil suggests, or to use your OS's I/O functions. The disadvantage here is that your data isn't formatted, your file probably isn't human-readable, and it isn't easily readable on architectures with a different endianness than the one you wrote from. But it will get your data to disk fast and without adding memory requirements to your application.转储向量的最快(但最可怕)的方法是使用 ostream::write 在一次操作中写入它:
您可以使用模板函数使其变得更好一点:
它允许您说以下内容:
将其读回会有点问题,除非你以某种方式在写入前加上向量大小的前缀(或者向量是固定大小的),即使如此,你也会在不同的字节序和/或 32 v 64 位架构上遇到问题,因为有几个人们指出。
The quickest (but most horrible) way to dump a vector will be to write it in one operation with ostream::write:
You can make this a bit nicer with a template function:
which allows you to say things like:
Reading it back in will be a bit problematic, unless you prefix the write with an the size of the vector somehow (or the vectors are fixed-sized), and even then you will have problems on different endian and/or 32 v 64 bit architectures, as several people have pointed out.
我想这取决于实现。如果您没有获得所需的性能,您始终可以 memmap 结果文件并将 std::vector 数据 memcpy 到 memmap 的文件。
I guess that's implementation dependent. If you don't get the performance you want, you can always memmap the result file and memcpy the std::vector data to the memmapped file.
如果您使用 ofstream 构造 ostream_iterator ,这将确保输出被缓冲:
ofstream 对象被缓冲,因此写入流的任何内容都会在写入磁盘之前得到缓冲。
if you construct the ostream_iterator with an ofstream, that will make sure the output is buffered:
the ofstream object is buffered, so anything written to the stream will get buffered before written to disk.
您还没有写出您想要如何使用迭代器(我假设
std::copy
)以及您是否想要写入数据二进制文件或字符串。我希望有一个不错的 std::copy 实现,可以将 POD 分叉到 std::memcpy 中,并使用哑指针作为迭代器(例如,Dinkumware 就是这样做的)。但是,对于 ostream 迭代器,我认为 std::copy 的任何实现都不会执行此操作,因为它无法直接访问要写入的 ostream 缓冲区。
不过,流本身也会缓冲。
最后,我会先编写最简单的代码,然后对其进行测量。如果足够快,则继续处理下一个问题。如果这是一种速度不够快的代码,那么您无论如何都必须诉诸特定于操作系统的技巧。
You haven't written how you want to use the iterators (I'll presume
std::copy
) and whether you want to write the data binary or as strings.I would expect a decent implementation of
std::copy
to fork intostd::memcpy
for PODs and with dumb pointers as iterators (Dinkumware, for example, does so). However, with ostream iterators, I don't think any implementation ofstd::copy
will do this, as it doesn't have direct access to the ostream's buffer to write into.The streams themselves, though, buffer, too.
In the end, I would write the simplest possible code first, and measure this. If it's fast enough, move on to the next problem. If this is code of the sort that cannot be fast enough, you'll have to resort to OS-specific tricks anyway.
它将迭代元素。迭代器不允许您一次处理多个项目。另外,IIRC,它会将您的整数转换为其 ASCII 表示形式。
如果您想通过
ostream
将向量中的所有内容以二进制形式写入文件,您需要如下所示:It will iterate over the elements. Iterators don't let you mess with more than one item at a time. Also, IIRC, it will convert your integers to their ASCII representations.
If you want to write everything in the vector, in binary, to the file in one step via an
ostream
, you want something like: