为什么一个循环比另一个循环需要更长的时间来检测共享内存更新？

发布于 2024-08-26 21:22:03 字数 1059 浏览 18 评论 0原文

我编写了一个写入共享内存的“服务器”程序，以及一个从内存读取的客户端程序。服务器有不同的“通道”可以写入，这些通道只是它附加项目的不同链接列表。客户端对某些链表感兴趣，并希望以尽可能小的延迟读取添加到这些链表中的每个节点。

我为客户端提供了两种方法：

对于每个链接列表，客户端保留一个“书签”指针以保留其在链接列表中的位置。它循环链表，一遍又一遍地迭代所有链表（它永远循环），如果可以的话，每次将每个书签向前移动一个节点。是否可以由节点的“下一个”成员的值确定。如果它是非空的，那么跳转到下一个节点是安全的（服务器自动将其从空切换到非空）。这种方法工作正常，但如果有很多列表需要迭代，并且只有少数列表正在接收更新，则延迟会变得很糟糕。
服务器为每个列表提供一个唯一的 ID。每次服务器将一个项目附加到列表时，它还会将列表的 ID 号附加到主“更新列表”。客户端只保留一个书签，一个书签到更新列表中。它不断地检查书签的下一个指针是否为非空（ while(node->next_ == NULL) {} ），如果是，则继续前进，读取给定的 ID，然后处理新的链表上具有该 ID 的节点。从理论上讲，这应该可以更好地处理大量列表，因为客户端不必每次都迭代所有列表。

当我对两种方法的延迟进行基准测试时（使用 gettimeofday），令我惊讶的是#2 非常糟糕。第一种方法，对于少量链表，延迟通常低于 20us。第二种方法会有少量的低延迟，但通常在 4,000-7,000us 之间！

通过到处插入 gettimeofday，我确定方法 #2 中所有增加的延迟都花在循环中重复检查下一个指针是否为非空。这让我很困惑；就好像一个进程中的更改需要更长的时间才能使用第二种方法“发布”到第二个进程。我认为正在发生某种我不明白的缓存交互。这是怎么回事？

更新：最初，方法 #2 使用条件变量，因此如果 node->next_ == NULL 它将等待条件，并且服务器每次发出更新时都会通知该条件。延迟是相同的，并且在试图找出为什么我将代码减少到上面的方法时。我在一台多核计算机上运行，因此一个进程自旋锁不应影响另一个进程。

更新 2：node->next_ 是不稳定的。

原文

I've written a 'server' program that writes to shared memory, and a client program that reads from the memory. The server has different 'channels' that it can be writing to, which are just different linked lists that it's appending items too. The client is interested in some of the linked lists, and wants to read every node that's added to those lists as it comes in, with the minimum latency possible.

I have 2 approaches for the client:

For each linked list, the client keeps a 'bookmark' pointer to keep its place within the linked list. It round robins the linked lists, iterating through all of them over and over (it loops forever), moving each bookmark one node forward each time if it can. Whether it can is determined by the value of a 'next' member of the node. If it's non-null, then jumping to the next node is safe (the server switches it from null to non-null atomically). This approach works OK, but if there are a lot of lists to iterate over, and only a few of them are receiving updates, the latency gets bad.
The server gives each list a unique ID. Each time the server appends an item to a list, it also appends the ID number of the list to a master 'update list'. The client only keeps one bookmark, a bookmark into the update list. It endlessly checks if the bookmark's next pointer is non-null ( while(node->next_ == NULL) {} ), if so moves ahead, reads the ID given, and then processes the new node on the linked list that has that ID. This, in theory, should handle large numbers of lists much better, because the client doesn't have to iterate over all of them each time.

When I benchmarked the latency of both approaches (using gettimeofday), to my surprise #2 was terrible. The first approach, for a small number of linked lists, would often be under 20us of latency. The second approach would have small spats of low latencies but often be between 4,000-7,000us!

Through inserting gettimeofday's here and there, I've determined that all of the added latency in approach #2 is spent in the loop repeatedly checking if the next pointer is non-null. This is puzzling to me; it's as if the change in one process is taking longer to 'publish' to the second process with the second approach. I assume there's some sort of cache interaction going on I don't understand. What's going on?

Update: Originally, approach #2 used a condition variable, so that if node->next_ == NULL it would wait on the condition, and the server would notify on the condition everytime it issued an update. The latency was the same, and in trying to figure out why I reduced the code down to the approach above. I'm running on a multicore machine, so one process spinlocking shouldn't affect the other.

Update 2: node->next_ is volatile.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

风和你 2024-09-02 21:22:03

由于听起来读取和写入发生在不同的 CPU 上，也许内存屏障会有所帮助？您的写入可能不会在您期望的时候发生。

回复收藏 0 原文

九歌凝 2024-09-02 21:22:03

你在#2中做了一个Spin Lock，这通常不是一个好主意，并且正在咀嚼周期。

回复收藏 0 原文

聽兲甴掵 2024-09-02 21:22:03

您是否尝试过在第二种方法中每次失败的轮询尝试后添加yield？只是猜测，但它可能会减少功率循环。

使用 Boost.Thread ，这看起来像这样：

while(node->next_ == NULL) {
    boost::this_thread::yield( );
}

Have you tried adding a yield after each failed polling-attempt in your second approach? Just a guess, but it may reduce the power-looping.

With Boost.Thread this would look like this:

while(node->next_ == NULL) {
    boost::this_thread::yield( );
}

回复收藏 0 原文

~没有更多了~

关于作者

随心而道

暂无简介

文章

28 人气

关注发私信

友情链接

文江博客

为什么一个循环比另一个循环需要更长的时间来检测共享内存更新？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

浪子阿飞

JK.Yang

人间不值得

静待花开

只涨不跌

污浊的双黑

友情链接

为什么一个循环比另一个循环需要更长的时间来检测共享内存更新？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

浪子阿飞

JK.Yang

人间不值得

静待花开

只涨不跌

污浊的双黑

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。