意外的套接字 CPU 使用率

发布于 2024-10-21 17:07:54 字数 1491 浏览 1 评论 0原文

我遇到了一个我不明白的性能问题。我正在开发的系统有两个线程，如下所示：

版本 A：

线程 1：数据处理 ->数据选择->数据格式化->先进先出
线程2：先进先出-> Socket

其中“选择”会减少数据，线程 1 末尾的 FIFO 是线程 2 开头的 FIFO（这些 FIFO 实际上是 TBB 并发队列）。出于性能原因，我将线程更改为如下所示：

版本 B：

线程 1：数据处理 ->数据选择->先进先出
线程2：先进先出->数据格式化-> Socket

最初，这种优化被证明是成功的。线程 1 能够实现更高的吞吐量。我没有太仔细地关注线程 2 的性能，因为我预计 CPU 使用率会更高，并且（由于数据稀疏）这不是一个主要问题。然而，我的一位同事要求对版本 A 和版本 B 进行性能比较。为了测试设置，我将线程 2 的套接字（boost asio tcp 套接字）写入同一个盒子 (127.0.0.1) 上的 iperf 实例：显示最大吞吐量的目标。

为了比较这两种设置，我首先尝试强制系统以 500 Mbps 的速度从套接字写入数据。作为性能测试的一部分，我监控了 top。我所看到的让我感到惊讶。版本 A 没有出现在“top -H”上，iperf 也没有出现（这实际上正如怀疑的那样）。然而，版本 B（我的“增强版本”）显示在“top -H”上，CPU 利用率约为 10%，而（奇怪的是）iperf 显示为 8%。

显然，这对我来说意味着我做错了什么。我似乎无法证明我是！我已经确认的事情：

两个版本都为套接字提供了 32k 数据块
两个版本都使用相同的 boost 库 (1.45)
两者都有相同的优化设置 (-O3)
两者都接收完全相同的数据，写出相同的数据，并以相同的速率写入。
两者都使用相同的阻塞写入调用。
我正在使用完全相同的设置（Red Hat）从同一个盒子进行测试
线程 2 的“格式化”部分不是问题（我将其删除并重现了问题）
网络上的小数据包这不是问题（我正在使用 TCP_CORK，并且我已通过wireshark 确认 TCP 段都是 ~16k）。
在套接字写入后立即进行 1 毫秒睡眠会使套接字线程和 iperf(?!) 上的 CPU 使用率恢复到 0%。
穷人的分析器揭示的信息非常少（套接字线程几乎总是处于睡眠状态）。
Callgrind 透露的信息非常少（套接字几乎不写入寄存器）
将 iperf 切换为 netcat（写入 /dev/null）不会改变任何内容（实际上 netcat 的 cpu 使用率约为 20%）。

我唯一能想到的是我在套接字写入周围引入了更紧密的循环。但是，在 500 Mbps 时，我不会期望我的进程和 iperf 上的 CPU 使用率都会增加？

我不知道为什么会发生这种情况。我和我的同事基本上没有想法。有什么想法或建议吗？此时我会很乐意尝试任何事情。

原文

I'm having a performance issue that I don't understand. The system I'm working on has two threads that look something like this:

Version A:

Thread 1: Data Processing -> Data Selection -> Data Formatting -> FIFO
Thread 2: FIFO -> Socket

Where 'Selection' thins down the data and the FIFO at the end of thread 1 is the FIFO at the beginning of thread 2 (the FIFOs are actually TBB Concurrent Queues). For performance reasons, I've altered the threads to look like this:

Version B:

Thread 1: Data Processing -> Data Selection -> FIFO
Thread 2: FIFO -> Data Formatting -> Socket

Initially, this optimization proved to be successful. Thread 1 is capable of much higher throughput. I didn't look too hard at Thread 2's performance because I expected the CPU usage would be higher and (due to data thinning) it wasn't a major concern. However, one of my colleagues asked for a performance comparison of version A and version B. To test the setup I had thread 2's socket (a boost asio tcp socket) write to an instance of iperf on the same box (127.0.0.1) with the goal of showing the maximum throughput.

To compare the two set ups I first tried forcing the system to write data out of the socket at 500 Mbps. As part of the performance testing I monitored top. What I saw surprised me. Version A did not show up on 'top -H' nor did iperf (this was actually as suspected). However, version B (my 'enhanced version') was showing up on 'top -H' with ~10% cpu utilization and (oddly) iperf was showing up with 8%.

Obviously, that implied to me that I was doing something wrong. I can't seem to prove that I am though! Things I've confirmed:

Both versions are giving the socket 32k chunks of data
Both versions are using the same boost library (1.45)
Both have the same optimization setting (-O3)
Both receive the exact same data, write out the same data, and write it at the same rate.
Both use the same blocking write call.
I'm testing from the same box with the exact same setup (Red Hat)
The 'formatting' part of thread 2 is not the issue (I removed it and reproduced the problem)
Small packets across the network is not the issue (I'm using TCP_CORK and I've confirmed via wireshark that the TCP Segments are all ~16k).
Putting a 1 ms sleep right after the socket write makes the CPU usage on both the socket thread and iperf(?!) go back to 0%.
Poor man's profiler reveals very little (the socket thread is almost always sleeping).
Callgrind reveals very little (the socket write barely even registers)
Switching iperf for netcat (writing to /dev/null) doesn't change anything (actually netcat's cpu usage was ~20%).

The only thing I can think of is that I've introduced a tighter loop around the socket write. However, at 500 Mbps I wouldn't expect that the cpu usage on both my process and iperf would be increased?

I'm at a loss to why this is happening. My coworkers and I are basically out of ideas. Any thoughts or suggestions? I'll happily try anything at this point.

分享到QQ

分享到微博