ostream_iterator 将数字数据写入文件的性能?

发布于 2024-08-16 10:03:58 字数 142 浏览 3 评论 0原文

我有各种带有数字数据的 std::vector 实例,主要是 int16_t、int32_t 等。我想以尽可能快的方式将这些数据转储到文件中。如果我使用 ostream_iterator,它会在一次操作中写入整个内存块,还是会迭代向量的元素,为每个元素发出一个写入操作?

I've got various std::vector instances with numeric data in them, primarily int16_t, int32_t, etc. I'd like to dump this data to a file in as fast a manner as possible. If I use an ostream_iterator, will it write the entire block of memory in a single operation, or will it iterate over the elements of the vector, issuing a write operation for each one?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

一紙繁鸢 2024-08-23 10:03:58

在我熟悉的任何实现中,流迭代器和向量肯定不会使用块副本。例如,如果向量项类型是类而不是 POD,则直接复制将是一件坏事。我怀疑 ostream 也会格式化输出,而不是直接写入值(即,ascii 而不是二进制输出)。

您可能会更好地使用 boost::copy,因为它专门针对在可能的情况下进行块写入进行了优化,但最实用的解决方案是直接使用 &v[ 对向量内存进行操作0]

A stream iterator and a vector will definitely not use a block copy in any implementation I'm familiar with. If the vector item type was a class rather than POD, for example, a direct copy would be a bad thing. I suspect the ostream will format the output as well, rather than writing the values directly (i.e., ascii instead of binary output).

You might have better luck with boost::copy, as it's specifically optimized to do block writes when possible, but the most practical solution is to operate on the vector memory directly using &v[0].

很糊涂小朋友 2024-08-23 10:03:58

我所知道的大多数 ofstream 实现都会缓冲数据,因此您可能最终不会进行过多的写入。在实际写入完成之前,ofstream() 中的缓冲区必须填满,大多数操作系统的缓冲区文件数据也位于此之下。从 C++ 应用程序级别来看,它们之间的相互作用根本不透明;缓冲区大小等的选择由实现决定。

C++ 确实提供了一种向 ostreamstreambuf。您可以尝试像这样调用 pubsetbuf

char *mybuffer = new char[bufsize];
os.rdbuf()->pubsetbuf(mybuffer, bufsize);

缺点是这不一定能起到任何作用。有些实现只是忽略它

如果您想缓冲数据并仍然使用ostream_iterator,则另一个选择是使用ostringstream,例如:

ostringstream buffered_chars;
copy(data.begin(), data.end(), ostream_iterator<char>(buffered_chars, " ");
string buffer(buffered_chars.str());

然后,一旦所有数据都被缓冲,您就可以写入整个数据使用一个大的 ostream::write()、POSIX I/O 等来缓冲。

不过,这仍然可能很慢,因为您正在执行格式化输出,并且必须有两个副本内存中的数据一次:原始数据和格式化的缓冲数据。如果您的应用程序已经突破了内存限制,那么这不是最好的方法,并且您最好使用 ofstream 为您提供的内置缓冲。

最后,如果您确实想要性能,最快的方法是使用 ostream::write() 将原始内存转储到磁盘,如 Neil 建议,或者使用操作系统的 I/O 函数。这里的缺点是您的数据没有格式化,您的文件可能不是人类可读的,并且在与您写入的字节序不同的体系结构上不容易读取它。但它会将您的数据快速保存到磁盘,并且不会增加应用程序的内存要求。

Most ofstream implementations I know of do buffer data, so you probably will not end up doing an excessive number of writes. The buffer in the ofstream() has to fill up before an actual write is done, and most OS's buffer file data underneath this, too. The interplay of these is not at all transparent from the C++ application level; selection of buffer sizes, etc. is left up to the implementation.

C++ does provide a way to supply your own buffer to an ostream's streambuf. You can try calling pubsetbuf like this:

char *mybuffer = new char[bufsize];
os.rdbuf()->pubsetbuf(mybuffer, bufsize);

The downside is that this doesn't necessarily do anything. Some implementations just ignore it.

The other option you have if you want to buffer things and still use ostream_iterator is to use an ostringstream, e.g.:

ostringstream buffered_chars;
copy(data.begin(), data.end(), ostream_iterator<char>(buffered_chars, " ");
string buffer(buffered_chars.str());

Then once all your data is buffered, you can write the entire buffer using one big ostream::write(), POSIX I/O, etc.

This can still be slow, though, since you're doing formatted output, and you have to have two copies of your data in memory at once: the raw data and the formatted, buffered data. If your application pushes the limits of memory already, this isn't the greatest way to go, and you're probably better off using the built-in buffering that ofstream gives you.

Finally, if you really want performance, the fastest way to do this is to dump the raw memory to disk using ostream::write() as Neil suggests, or to use your OS's I/O functions. The disadvantage here is that your data isn't formatted, your file probably isn't human-readable, and it isn't easily readable on architectures with a different endianness than the one you wrote from. But it will get your data to disk fast and without adding memory requirements to your application.

野却迷人 2024-08-23 10:03:58

转储向量的最快(但最可怕)的方法是使用 ostream::write 在一次操作中写入它:

   os.write( (char *) &v[0], v.size() * sizeof( value_type) );

您可以使用模板函数使其变得更好一点:

template <typename T> 
std::ostream & DumpVec( std::ostream & os, const std::vector <T> & v ) {
    return os.write( &v[0], v.size() * sizeof( T ) );
}

它允许您说以下内容:

vector <unsigned int> v;
ofstream f( "file.dat" );
...
DumpVec( f, v );

将其读回会有点问题,除非你以某种方式在写入前加上向量大小的前缀(或者向量是固定大小的),即使如此,你也会在不同的字节序和/或 32 v 64 位架构上遇到问题,因为有几个人们指出。

The quickest (but most horrible) way to dump a vector will be to write it in one operation with ostream::write:

   os.write( (char *) &v[0], v.size() * sizeof( value_type) );

You can make this a bit nicer with a template function:

template <typename T> 
std::ostream & DumpVec( std::ostream & os, const std::vector <T> & v ) {
    return os.write( &v[0], v.size() * sizeof( T ) );
}

which allows you to say things like:

vector <unsigned int> v;
ofstream f( "file.dat" );
...
DumpVec( f, v );

Reading it back in will be a bit problematic, unless you prefix the write with an the size of the vector somehow (or the vectors are fixed-sized), and even then you will have problems on different endian and/or 32 v 64 bit architectures, as several people have pointed out.

那一片橙海, 2024-08-23 10:03:58

我想这取决于实现。如果您没有获得所需的性能,您始终可以 memmap 结果文件并将 std::vector 数据 memcpy 到 memmap 的文件。

I guess that's implementation dependent. If you don't get the performance you want, you can always memmap the result file and memcpy the std::vector data to the memmapped file.

澉约 2024-08-23 10:03:58

如果您使用 ofstream 构造 ostream_iterator ,这将确保输出被缓冲:

ofstream ofs("file.txt");
ostream_iterator<int> osi(ofs, ", ");
copy(v.begin(), v.end(), osi);

ofstream 对象被缓冲,因此写入流的任何内容都会在写入磁盘之前得到缓冲。

if you construct the ostream_iterator with an ofstream, that will make sure the output is buffered:

ofstream ofs("file.txt");
ostream_iterator<int> osi(ofs, ", ");
copy(v.begin(), v.end(), osi);

the ofstream object is buffered, so anything written to the stream will get buffered before written to disk.

尘世孤行 2024-08-23 10:03:58

您还没有写出您想要如何使用迭代器(我假设 std::copy)以及您是否想要写入数据二进制文件或字符串。

我希望有一个不错的 std::copy 实现,可以将 POD 分叉到 std::memcpy 中,并使用哑指针作为迭代器(例如,Dinkumware 就是这样做的)。但是,对于 ostream 迭代器,我认为 std::copy 的任何实现都不会执行此操作,因为它无法直接访问要写入的 ostream 缓冲区。

不过,流本身也会缓冲。

最后,我会先编写最简单的代码,然后对其进行测量。如果足够快,则继续处理下一个问题。如果这是一种速度不够快的代码,那么您无论如何都必须诉诸特定于操作系统的技巧。

You haven't written how you want to use the iterators (I'll presume std::copy) and whether you want to write the data binary or as strings.

I would expect a decent implementation of std::copy to fork into std::memcpy for PODs and with dumb pointers as iterators (Dinkumware, for example, does so). However, with ostream iterators, I don't think any implementation of std::copy will do this, as it doesn't have direct access to the ostream's buffer to write into.

The streams themselves, though, buffer, too.

In the end, I would write the simplest possible code first, and measure this. If it's fast enough, move on to the next problem. If this is code of the sort that cannot be fast enough, you'll have to resort to OS-specific tricks anyway.

灰色世界里的红玫瑰 2024-08-23 10:03:58

它将迭代元素。迭代器不允许您一次处理多个项目。另外,IIRC,它会将您的整数转换为其 ASCII 表示形式。

如果您想通过 ostream 将向量中的所有内容以二进制形式写入文件,您需要如下所示:

template<class T>
void WriteArray(std::ostream& os, const std::vector<T>& v)
{
    os.write(static_cast<const char*>(&v[0]), v.size() * sizeof(T));
}

It will iterate over the elements. Iterators don't let you mess with more than one item at a time. Also, IIRC, it will convert your integers to their ASCII representations.

If you want to write everything in the vector, in binary, to the file in one step via an ostream, you want something like:

template<class T>
void WriteArray(std::ostream& os, const std::vector<T>& v)
{
    os.write(static_cast<const char*>(&v[0]), v.size() * sizeof(T));
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文