关于STL内部结构的问题
我目前正在为二进制数据编写一些关于 IO 的抽象。目前我还不确定 STL 在其中一些任务上的表现如何。例如,我有很多东西可以将二进制编码为 char * 或 std::vector。现在,每当我有这种字节类型的对象时,我要么使用 ostream::write() 写入它,要么对数组执行 std::copy 到流上的 ostream_iterater 。现在我想知道副本在内部会做什么。
据我所知,STL 可以优化任何东西。例如,理论上,使用 std::copy 存储字符的两个向量的副本不应缓慢地逐字节复制这些字符,而应使用系统原语来复制数据块(如果可用)。这是如何在内部完成的。
我问这个的原因是因为我现在尝试将文件切换到 mmaped 内存而不是 std::ostreams。这意味着,写入 char* 数据将非常简单,但写入向量将是逐字节的。我必须在课堂上为 STL 提供什么来优化复制(可能使用 memcpy)?我猜我需要正确类型的迭代器,但是它们需要什么,这样 STL 就会知道它只能进行内存复制而不是遍历它们。
我知道这会问很多我通常不应该关心的事情(封装原则通常是一件很棒的事情)。当然,我知道 Knuths 优化规则,这就是为什么我关心 STL 的自动优化功能。
I am currently writing some abstractions on IO for binary data. At this point I am currently not sure on how well the STL performs on some of these tasks. For example I have a lot of stuff I can encode binary to either char * or std::vector. For now whenever I have an object of this kind of byte type I either just write it using ostream::write() or do a std::copy on the array to a ostream_iterater on the stream. Now I was wondering, what the copy will do internally.
From what I heard, the STL is allowed to optimize anything. For example in Theory a copy of two vectors storing chars using std::copy should not copy these chars byte by byte slowly but rather use system primitives for copying chuncks of data, where available. How is this done internally.
The reason I am asking this, is because I am now trying to switch the file over to mmaped memory instead of std::ostreams. This means, that writing the char* data will be really simple, but writing vectors will be byte by byte. What would I have to provide for in my class for the STL to optimize the copying away (probably using memcpy)? I am guessing I need the right kind of iterators, but what do they need, so the STL will know it can just memcopy instead of walking them.
I know this is asking a lot of stuff I should not normally care about (principle of encapsulation is a great thing usually). And of course I know of Knuths rule of optimization, that is why I am caring about the automatic optimization facilities of the STL.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
iostream
仅适用于格式化(即文本)IO。如果你想要二进制IO,你必须使用streambuf
类。此外,iostream 还享有缓慢的声誉(由于各种原因,您的里程会有所不同)。
Iostreams 在内部使用streambuf,这增加了一个间接层,并为您提供自动缓冲。如果您需要合理的二进制 IO 吞吐量,您可能需要直接使用 Streambuf 派生类(例如
fstreambuf
)并对其进行基准测试(并禁用 与 stdio 同步)。或者您可以直接使用
mmap
或write
。这些函数使用起来非常简单,并且应该很容易围绕它们编写自己的类。哦,不要对标准库的功能做任何假设。如果您想了解更多有关其内部工作方式的信息,请检查例如的来源。 GNU 实现。
iostream
is for formatted (ie. text) IO only. If you want binary IO, you have to usestreambuf
classes.Also, iostreams have the reputation of being slow (for various reasons, and your mileage will vary).
Iostreams use streambuf internally, which adds a layer of indirection, and provides you with automatic buffering. If you need reasonable binary IO throughput, you may want to use streambuf derived classes directly (like
fstreambuf
) and benchmark it (and disable synchronization with stdio).Or you can directly use
mmap
orwrite
. Those functions are quite simple to use, and it should be easy to write your own classes around it.Oh, and don't assume anything on what the standard library does. If you want to know more about how it does things internally, check the sources of eg. the GNU implementation.
如果您不确定 STL 的性能如何,测试是无可替代的。计算多次 std::copy 一块数据需要多长时间,以及使用 memcopy 复制相同数量的数据需要多长时间,并进行比较。
自己做这些测试比担心 STL 优化更有启发性。
If you aren't sure how well the STL performs, there is no substitute for testing. Time how long it takes to std::copy a chunk of data lots of times, and how long it takes to copy the same amount of data using memcopy, and compare.
Doing these tests yourself will be far more instructive than worrying about STL optimisation.
不太清楚你在问什么。您提到了向量、std::copy、char* 和内存映射文件,但它们之间没有明显的联系。向我们展示一些代码,或者描述您想要执行的操作以及使用哪种数据类型。
但 STL 实现中的常见优化是使用 memcpy 或类似的原始内存复制机制,只要您复制的对象类型是 POD。因此,假设您的 STL 实现中存在这种优化,您所要做的就是确保您正在复制的对象是 POD 类型。
但如前所述,获得有关性能的可靠信息的唯一方法是亲自对其进行分析/测量/基准测试。
It's not really clear what you're asking. You mention vectors,
std::copy
,char*
and memory-mapped files, but there's no obvious connection between them. Show us some code, or describe what you're trying to do, and with what kind of data types.But a common optimization in STL implementations is to use
memcpy
or a similar raw memory copying mechanism as long as the object type you're copying is POD. So assuming this optimization exists in your STL implementation, all you have to do is make sure the objects you're copying are POD types.But as previously mentioned, the only way to get reliable information about performance is to profile/measure/benchmark it yourself.