当线程多于内核时Linux调度程序的吞吐量
我对 Linux 调度程序做了一些测量。 Linux 是“Linux 版本 2.6.18-194.el5 ([电子邮件受保护])" 并且机器具有 8 个 CPU。测量是该机器上的唯一工作负载。
测量是两组。在第一组中,设置了 8 个线程,每个线程的计算成本相同。第二组是将一个线程分成两个,总共 9 个线程(其中 2 个线程的成本是其他 7 个线程的一半)。
当我运行两个测量集时,我期望吞吐量是相同的,因为总计算成本是相同的,并且 Linux 调度程序应该(尽管我不确定)将这两个较小的线程调度在一个核心中。结果表明,吞吐量从 8 个线程急剧下降到 9 个线程。任何人都知道可能是什么原因。
编辑:@Waldheinz。这些线程按顺序设置(例如 0、1 ... 7),并且(无尽的)元组流从线程 0、1 到线程 7。每个元组在每个线程上花费一些时间,进行一些计算。所有 8 个线程的计算成本与第一组测量中的相同。
更新:如果线程数改为16,意味着每个核心有两个线程,吞吐量将提高到8线程的情况...
I have done some measurement over linux scheduler. The linux is "Linux version 2.6.18-194.el5 ([email protected])" and machine is with 8 cpus. The measurement is the only workload on that machine.
The measurement is two sets. In the first set, 8 threads are set up and each of same computation costs. Second set is to split one thread into two, resulting in totally 9 threads (2 out of which is half in cost of the other 7 threads).
When I run the two measurement sets, I expect the throughput is the same, for the total computation costs are the same and linux scheduler should (though I'm not sure) schedule those two smaller threads in one core. The results turn out to be there is dramatic decrease in throughput from 8 threads to 9 threads. Anyone has ideas what could be the reason.
Edit: @Waldheinz. Those threads are set up in order (say 0, 1 ... 7) and a (endless) stream of tuples go through from thread 0, 1 to thread 7. Each tuple spend sometime on each thread, doing some computation. All 8 threads are of the same computation costs as in the first set of measurement.
Updates: If the number of threads changed to 16, meaning every core has two threads, throughput is improved to the case of 8 threads...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
Linux 2.6.18 现在已经很老了,可以追溯到 2006 年,当时多核系统还没有那么普遍或重要。您的基准测试可能会遇到 O(1) 调度程序 的一些缺陷内核在 2.6.23 之前一直使用。我完全忘记了这些问题是什么,但听起来似乎有道理。 O(1) 部分指的是调度的开销本质上是恒定的,但即使是这样,调度程序在某些情况下也会做出糟糕的决策。
如果可以的话,尝试更新的内核(2.6.23 之后)并查看新的完全公平的调度程序 有所作为。
Linux 2.6.18 is quite old now, dating to 2006, and multi-core systems were not as common or important back then. It's possible that your benchmark exercises some of the deficiencies of the O(1) scheduler that the kernel used up until 2.6.23. I forget exactly what those problems were, but it sounds plausible. The O(1) part refers to the fact that overhead of scheduling is essentially constant, but even though that was the case, the scheduler made poor decisions in some situations.
If you can, try a more recent kernel (after 2.6.23) and see if the new completely fair scheduler makes a difference.
九名妇女在九个月内可以生九个孩子,即每人生九个月的孩子。一名妇女可以在九个月内生一个孩子,同样是每人生一个孩子九个月。但九个女人仍然需要十八个月才能生十个孩子,更糟糕的是每人每个孩子超过十六个月!
您为线程分配的工作块太大,并且运行测试的时间不够长,无法平滑块大小。
Nine women can have nine babies in nine months, a rate of nine months per baby per person. One woman can have one baby in nine months, again a rate of nine months per baby per person. But nine women still need eighteen months to have ten babies, a much worse rate of more than sixteen months per baby per person!
You are assigning your threads chunks of work that are too large and not running your test for long enough to smooth out the chunk size.