在 Go 配置文件中,runtime.futex 占用了 50% 到 70% 的时间

发布于 2025-01-09 19:36:27 字数 1694 浏览 4 评论 0原文

作为分析几个 golang 服务的一部分,我们发现所有服务都在 runtime.futex 函数中花费了 55% 到 70% 的时间。

请注意,这些服务涉及使用 goroutine、锁以及用于 goroutine 之间同步和通信的通道。

这是预期的吗?我们应该只关心在非“运行时”函数中花费的时间,或者这可能是由于 goroutines/locks/channels 的错误使用而导致 futex 调用中的开销?

(pprof) top20
Showing nodes accounting for 43.80s, 80.25% of 54.58s total
Dropped 654 nodes (cum <= 0.27s)
Showing top 20 nodes out of 156
      flat  flat%   sum%        cum   cum%
    31.05s 56.89% 56.89%     31.05s 56.89%  runtime.futex
     3.55s  6.50% 63.39%      4.51s  8.26%  runtime.cgocall
     2.25s  4.12% 67.52%      2.25s  4.12%  runtime.usleep
     1.14s  2.09% 69.60%      1.53s  2.80%  runtime.scanobject
     0.73s  1.34% 70.94%      1.56s  2.86%  syscall.Syscall
     0.71s  1.30% 72.24%      2.27s  4.16%  runtime.mallocgc
(pprof) top20
Showing nodes accounting for 12.49s, 95.93% of 13.02s total
Dropped 93 nodes (cum <= 0.07s)
Showing top 20 nodes out of 47
      flat  flat%   sum%        cum   cum%
     9.69s 74.42% 74.42%      9.69s 74.42%  runtime.futex
     1.25s  9.60% 84.02%      1.63s 12.52%  runtime.cgocall
     0.64s  4.92% 88.94%      0.64s  4.92%  runtime.usleep
     0.17s  1.31% 90.25%      0.22s  1.69%  runtime.timeSleepUntil
     0.11s  0.84% 91.09%      0.17s  1.31%  runtime.scanobject
     0.08s  0.61% 91.71%      0.08s  0.61%  runtime.nanotime1
     0.08s  0.61% 92.32%      0.26s  2.00%  runtime.retake
     0.07s  0.54% 92.86%      0.07s  0.54%  runtime.lock2
     0.07s  0.54% 93.39%      0.13s     1%  runtime.traceEventLocked
     0.06s  0.46% 93.86%      0.36s  2.76%  runtime.notetsleep_internal
     0.05s  0.38% 94.24%      0.13s     1%  runtime.mallocgc

As part of profiling couple of golang services we are seeing that all the services are spending 55% to 70% time in the runtime.futex function.

Note that these services involve use of goroutines, locks, and channels for synchronization and communication between the goroutines.

Is this expected and we should only care about the time spent in the non-"runtime" functions or could this be an artifact of incorrect usage of goroutines/locks/channels that is causing this overhead in the futex call?

(pprof) top20
Showing nodes accounting for 43.80s, 80.25% of 54.58s total
Dropped 654 nodes (cum <= 0.27s)
Showing top 20 nodes out of 156
      flat  flat%   sum%        cum   cum%
    31.05s 56.89% 56.89%     31.05s 56.89%  runtime.futex
     3.55s  6.50% 63.39%      4.51s  8.26%  runtime.cgocall
     2.25s  4.12% 67.52%      2.25s  4.12%  runtime.usleep
     1.14s  2.09% 69.60%      1.53s  2.80%  runtime.scanobject
     0.73s  1.34% 70.94%      1.56s  2.86%  syscall.Syscall
     0.71s  1.30% 72.24%      2.27s  4.16%  runtime.mallocgc
(pprof) top20
Showing nodes accounting for 12.49s, 95.93% of 13.02s total
Dropped 93 nodes (cum <= 0.07s)
Showing top 20 nodes out of 47
      flat  flat%   sum%        cum   cum%
     9.69s 74.42% 74.42%      9.69s 74.42%  runtime.futex
     1.25s  9.60% 84.02%      1.63s 12.52%  runtime.cgocall
     0.64s  4.92% 88.94%      0.64s  4.92%  runtime.usleep
     0.17s  1.31% 90.25%      0.22s  1.69%  runtime.timeSleepUntil
     0.11s  0.84% 91.09%      0.17s  1.31%  runtime.scanobject
     0.08s  0.61% 91.71%      0.08s  0.61%  runtime.nanotime1
     0.08s  0.61% 92.32%      0.26s  2.00%  runtime.retake
     0.07s  0.54% 92.86%      0.07s  0.54%  runtime.lock2
     0.07s  0.54% 93.39%      0.13s     1%  runtime.traceEventLocked
     0.06s  0.46% 93.86%      0.36s  2.76%  runtime.notetsleep_internal
     0.05s  0.38% 94.24%      0.13s     1%  runtime.mallocgc

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

天暗了我发光 2025-01-16 19:36:27

对于一个没有做太多事情的程序来说,这并不罕见。如果你的程序花费大部分时间等待某些事情发生,它通常会在 futex 中休眠。

This is not unusual for a program that isn't doing very much. If your program is spending most of time waiting for something to happen, it will often be sleeping in a futex.

瞄了个咪的 2025-01-16 19:36:27

仅供参考:

我也遇到过同样的问题。就我而言,我有一个工作池实现。其中 200 个 goroutine 正在监听同一个通道,这反过来又引起了争用。然后我为每个 goroutine 使用 1 个通道,这大大减少了 runtime.futex 。

FYI:

I had faced the same problem. In my case, I had a worker pool implementation. In which 200 goroutines were listening on to the same channel which in turn caused the contention. Then I used 1 channel per goroutine and which reduced runtime.futex very much.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文