使用的最佳线程数
好的,我正在解决一个非常并行的问题。
- 生成素数(这并不是令人尴尬的并行,因为它们是从公共源写入(并从中读取以检查它们是否是一个因子)。
无论如何,激发我写这篇文章(部分)的原因是我意识到访问此 双 Xeon E5520 CPU(配有 IIRC 16GB RAM)
所以我知道每个 CPU 支持 8 个活动线程。 但是,后台进程(可能还有其他用户)会消耗其中的一些进程(实际上可能比所有进程都消耗更多)。 那么,在线程被开销阻碍之前,有多少线程可以让事情进展得更快,那么有什么好的经验法则呢? (我想这个规则需要考虑有多少线程可以同时处于活动状态)
Ok so, I'm solving an very parallel problem.
- generating primes (it's not quiet embarrassingly parallel, since they are written (and read from for checking if they are a factor) from a common source.
In any case, the thing that inspired me to write this (in part) was realisation of my access to this
dual Xeon E5520 CPUs (with IIRC 16GB ram to go with it)
So I know that each CPU supports 8 active threads.
But then there are background processes (and likely other users) using up some of those (in fact probably more that all of those).
So what is a good rule of thumb as to how many threads make things go faster, before they are being held back by their over head. (I guess this rule would need to take into account how many threads can be active at once)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
没有这样的规则。这取决于许多因素,特别是您的应用程序是否受 I/O 限制(听起来您的应用程序不是)。要做的事情是参数化线程数,以便可以从配置文件或命令行指定它,然后使用这个数字,直到找到适合您的特定问题和配置的最佳位置。
There is no such rule. It will depend on many factors, particularly on whether your app is I/O bound (it sounds like yours isn't). The thing to do is to parameterise the number of threads so that it can be specified from a config file or from the command line, and then play around with this number until you hit a sweet spot for your particular problem and configuration.
如果操作主要受 CPU 限制(不等待 I/O 操作),那么最好的第一个猜测是与逻辑 CPU 核心的数量成 1 比 1。考虑到生成素数主要受 CPU 限制,并且您将有 16 个逻辑核心可供使用,那么我会从 16 个线程开始。做一些测试,看看会发生什么。我预计性能会在 16 个线程左右达到峰值,但这实际上取决于存储已生成的素数时发生的 I/O 量。
If the operation is mostly CPU bound (not waiting for I/O operations) then a good first guess is 1-to-1 with the number of logical CPU cores. Considering that generating prime numbers is mostly CPU bound and that you will have 16 logical cores at your disposal then I would start with 16 threads. Do a few tests and see what happens. I expect the performance to peak around 16 threads, but that really depends on how much I/O is occurring to store the primes that have been generated.