为什么将 char[256] 结构推送到 std::queue 比推送 std::string 结构慢?

发布于 2024-09-29 01:03:49 字数 502 浏览 0 评论 0原文

我在 STL 队列推送中遇到了一种我不太理解的行为。

基本上,我有两个结构

struct A {
  string a;
};

struct B {
 char b[256];
};

struct A st1;
struct B st2;

/* ...assign a 256 characters string to both st1 and st2... */

queue<struct A> q1;
queue<struct B> q2;
for(int i=0; i< 10000; i++) {
  q1.push(st1);
}

 for(int i=0; i< 10000; i++) {
  q2.push(st2);
}

我意识到,与字符串结构相比,使用 char 结构的队列在推送结构时会使用更长的时间(例如 5X)。在检查各个推送后,我意识到 char struct 推送性能到处都有相当多的峰值(范围从 2 倍到 10 倍)。这是什么原因呢?

I come across a behavior in STL queue push which I don't quite understand.

Basically, I have two struct

struct A {
  string a;
};

struct B {
 char b[256];
};

struct A st1;
struct B st2;

/* ...assign a 256 characters string to both st1 and st2... */

queue<struct A> q1;
queue<struct B> q2;
for(int i=0; i< 10000; i++) {
  q1.push(st1);
}

 for(int i=0; i< 10000; i++) {
  q2.push(st2);
}

What I realize is that the queue using the char struct would use a much longer time (like 5X) in pushing the struct compared to string struct. Upon examination of the individual push, I realize that the char struct push performance has quite a number of spikes (range from 2X to 10X) here and there. What is the reason for that?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

风筝有风,海豚有海 2024-10-06 01:03:49

每次将 st1 或 st2 推送到队列中时,您实际上推送的是它的副本(而不是引用或指针)。成本差异在于数据的复制。在 structB 中,每次都必须复制完整的 256 个字节。在 structA 中,您只复制字符串实例,该实例很可能具有写时复制语义,因此在其中一个实例被修改之前,它们都将共享对底层字符串数据的相同引用。

Every time you push st1 or st2 onto a queue, you're actually pushing a copy of it (not a reference or pointer). The cost difference is in the copying of the data. In structB you have to copy the full 256 bytes every time. In structA you only copy the string instance, which most likely has copy-on-write semantics and hence until one of them is modified they will all share the same reference to the underlying string data.

潦草背影 2024-10-06 01:03:49

您的 C++ 实现可能使用写时复制字符串实现,这意味着字符串副本并不真正复制字符串(而是链接回副本),并且仅在写入时“真正”复制字符串。

要测试是否是这种情况,请将其放入循环内,在 q1.push(st1) 行之后:

++st1.a[0];

然后再次计时。

显然,字符数组没有写时复制行为,并且每次您要求复制时都会“真正”复制。

Your C++ implementation probably uses a copy-on-write string implementation, meaning that string copies don't really copy the string (but instead link back to the copy), and only copy the string "for real" when you write to it.

To test whether this is the case, put this inside the loop, after the q1.push(st1) line:

++st1.a[0];

then time again.

Obviously, character arrays don't have copy-on-write behaviour, and is copied "for real" every time you ask for it to be copied.

再可℃爱ぅ一点好了 2024-10-06 01:03:49

字符数组比空字符串大 - 峰值可能与向量随着其使用的内存量的增加而增长所需的重新分配有关。

如果字符串不为空,则无论如何都会启动写入时复制,因此您需要牺牲一些锁定时间/引用计数器增量等来对抗内存使用:更快的是系统相关的。

The character array is larger than an empty string would be - the spikes may relate to the reallocations necessary as the vector grows for the larger amount of memory it uses.

If the strings aren't empty, then copy-on-write kicks in anyway, so you're trading some locking time / reference counter incrementing etc against the memory use: what's faster is system dependent.

夜空下最亮的亮点 2024-10-06 01:03:49

原因很可能是由于:

1) 动态分配内存来保存每个字符串内的字符数据
2) 有可能,但可能性很小,调整支持队列的双端队列页面缓冲区的大小。

The reasons are most likely due to:

1) The dynamic allocation of memory to hold the character data inside each string
2) Possibly, but far less likely, the resizing of the deque page buffer that backs the queue.

陌伤浅笑 2024-10-06 01:03:49

std::queue 是另一个容器的适配器(实现了 front、back、push_back 和 pop_front),除非您指定要适配哪个容器,否则它将使用 std::deque。 Deque 在后台执行一些块分配魔法,应该提供类似于向量的大小调整,但性能更好,因为它管理多个不连续的块,并且每次调整大小时不必复制所有内容。无论如何,这是一个猜测,但我想说这就是原因。

由于为所有这些数组腾出了空间,字节数组结构会更频繁地看到命中,我敢打赌,在更长的范围内,字符串结构也会产生峰值,它现在更小,因为字符串可能会维护对底层字符存储的引用,直到有些东西改变了它。

现在您有机会熟悉您选择的分析器并确定答案!启动 valgrind (--callgrind) 或您的平台支持的任何分析器,并准确查看哪些调用正在使用时间和地点。

std::queue is an adapter to another container (that implements front, back, push_back, and pop_front), unless you specify which container to adapt, it the will use std::deque. Deque does some block allocation magic in the background that should provide resizing similar to vector but performs better since it manages multiple non contiguous blocks and doesn't have to copy everything each time it resizes. Anyway, it's a guess, but I'd say thats the cause.

The byte array structure is seeing the hits more frequently due to making room for all those arrays, I'd bet over a longer scale the string struct would also generate spikes, it's just smaller now since string likely maintains references to the underling character storage until something changes it.

Nows your chance to get familiar with your profiler of choice and find out for sure! Fire valgrind (--callgrind) or whatever profiler your platform supports and see exactly which calls are using the time and where.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文