在 Go 配置文件中,runtime.futex 占用了 50% 到 70% 的时间
作为分析几个 golang 服务的一部分,我们发现所有服务都在 runtime.futex 函数中花费了 55% 到 70% 的时间。
请注意,这些服务涉及使用 goroutine、锁以及用于 goroutine 之间同步和通信的通道。
这是预期的吗?我们应该只关心在非“运行时”函数中花费的时间,或者这可能是由于 goroutines/locks/channels 的错误使用而导致 futex 调用中的开销?
(pprof) top20
Showing nodes accounting for 43.80s, 80.25% of 54.58s total
Dropped 654 nodes (cum <= 0.27s)
Showing top 20 nodes out of 156
flat flat% sum% cum cum%
31.05s 56.89% 56.89% 31.05s 56.89% runtime.futex
3.55s 6.50% 63.39% 4.51s 8.26% runtime.cgocall
2.25s 4.12% 67.52% 2.25s 4.12% runtime.usleep
1.14s 2.09% 69.60% 1.53s 2.80% runtime.scanobject
0.73s 1.34% 70.94% 1.56s 2.86% syscall.Syscall
0.71s 1.30% 72.24% 2.27s 4.16% runtime.mallocgc
(pprof) top20
Showing nodes accounting for 12.49s, 95.93% of 13.02s total
Dropped 93 nodes (cum <= 0.07s)
Showing top 20 nodes out of 47
flat flat% sum% cum cum%
9.69s 74.42% 74.42% 9.69s 74.42% runtime.futex
1.25s 9.60% 84.02% 1.63s 12.52% runtime.cgocall
0.64s 4.92% 88.94% 0.64s 4.92% runtime.usleep
0.17s 1.31% 90.25% 0.22s 1.69% runtime.timeSleepUntil
0.11s 0.84% 91.09% 0.17s 1.31% runtime.scanobject
0.08s 0.61% 91.71% 0.08s 0.61% runtime.nanotime1
0.08s 0.61% 92.32% 0.26s 2.00% runtime.retake
0.07s 0.54% 92.86% 0.07s 0.54% runtime.lock2
0.07s 0.54% 93.39% 0.13s 1% runtime.traceEventLocked
0.06s 0.46% 93.86% 0.36s 2.76% runtime.notetsleep_internal
0.05s 0.38% 94.24% 0.13s 1% runtime.mallocgc
As part of profiling couple of golang services we are seeing that all the services are spending 55% to 70% time in the runtime.futex function.
Note that these services involve use of goroutines, locks, and channels for synchronization and communication between the goroutines.
Is this expected and we should only care about the time spent in the non-"runtime" functions or could this be an artifact of incorrect usage of goroutines/locks/channels that is causing this overhead in the futex call?
(pprof) top20
Showing nodes accounting for 43.80s, 80.25% of 54.58s total
Dropped 654 nodes (cum <= 0.27s)
Showing top 20 nodes out of 156
flat flat% sum% cum cum%
31.05s 56.89% 56.89% 31.05s 56.89% runtime.futex
3.55s 6.50% 63.39% 4.51s 8.26% runtime.cgocall
2.25s 4.12% 67.52% 2.25s 4.12% runtime.usleep
1.14s 2.09% 69.60% 1.53s 2.80% runtime.scanobject
0.73s 1.34% 70.94% 1.56s 2.86% syscall.Syscall
0.71s 1.30% 72.24% 2.27s 4.16% runtime.mallocgc
(pprof) top20
Showing nodes accounting for 12.49s, 95.93% of 13.02s total
Dropped 93 nodes (cum <= 0.07s)
Showing top 20 nodes out of 47
flat flat% sum% cum cum%
9.69s 74.42% 74.42% 9.69s 74.42% runtime.futex
1.25s 9.60% 84.02% 1.63s 12.52% runtime.cgocall
0.64s 4.92% 88.94% 0.64s 4.92% runtime.usleep
0.17s 1.31% 90.25% 0.22s 1.69% runtime.timeSleepUntil
0.11s 0.84% 91.09% 0.17s 1.31% runtime.scanobject
0.08s 0.61% 91.71% 0.08s 0.61% runtime.nanotime1
0.08s 0.61% 92.32% 0.26s 2.00% runtime.retake
0.07s 0.54% 92.86% 0.07s 0.54% runtime.lock2
0.07s 0.54% 93.39% 0.13s 1% runtime.traceEventLocked
0.06s 0.46% 93.86% 0.36s 2.76% runtime.notetsleep_internal
0.05s 0.38% 94.24% 0.13s 1% runtime.mallocgc
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
对于一个没有做太多事情的程序来说,这并不罕见。如果你的程序花费大部分时间等待某些事情发生,它通常会在 futex 中休眠。
This is not unusual for a program that isn't doing very much. If your program is spending most of time waiting for something to happen, it will often be sleeping in a futex.
仅供参考:
我也遇到过同样的问题。就我而言,我有一个工作池实现。其中 200 个 goroutine 正在监听同一个通道,这反过来又引起了争用。然后我为每个 goroutine 使用 1 个通道,这大大减少了 runtime.futex 。
FYI:
I had faced the same problem. In my case, I had a worker pool implementation. In which 200 goroutines were listening on to the same channel which in turn caused the contention. Then I used 1 channel per goroutine and which reduced runtime.futex very much.