CPU密集型线程智慧
我想在一台机器上运行一批,比如 20 个 CPU 密集型组件(基本上是很长的嵌套 for 循环)。
这 20 个作业中的每一个都不与其他 19 个作业共享数据。
如果机器有 N 个核心,那么我应该分拆其中 N-1 个作业吗?还是N?或者我应该启动所有 20 个任务,然后让 Windows 弄清楚如何安排它们?
I want to run a batch say 20 CPU intensive comps (basically really long nested for loop) on a machine.
Each of these 20 jobs doesn't share data with the other 19.
If the machine has N cores, should I spin off N-1 of these jobs then? Or N? Or should I just launch all 20, and have Windows figure out how to schedule them?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
不幸的是,没有简单的答案。唯一确定的方法是实施并分析您的应用程序。
通常,为了获得最大吞吐量,如果作业是纯 CPU,则您需要每个核心一个。根据工作类型,这将包括每个超线程代码一个或每个“真正的物理核心”仅一个。 (如果所有 20 个作业的工作都是相同的,那么超线程通常会减慢整体工作的速度...)
如果作业具有任何非 CPU 功能(例如读取文件、等待任何操作、等),那么每个核心 > 1 个工作项往往会好得多。对于许多情况,这都会有所改善。
Unfortunately, there is no simple answer. The only way to know for sure is to implement and then profile your application.
Typically, for maximum throughput, if the jobs are pure CPU, you'd want one per core. Depending on the type of work, this would include one per hyperthread code or just one per "true physical core". (If the work is identical for all 20 jobs, then hyperthreading often slows down the overall work...)
If the jobs have any non-CPU functionaltiy (such as reading a file, waiting on anything, etc), then >1 work item per core tends to be much better. For many situations, this will improve.
一般来说,如果您不共享数据,不阻塞 IO,并且使用大量 CPU 并且机器上没有运行其他任何东西(可能还有一些警告),那么使用所有 CPU(例如 N 线程)可能是最好的主意。
最好的选择可能是使其可配置并对其进行分析,然后看看会发生什么。
Generally, if you aren't sharing data, not blocking on IO, and using lots of CPU and nothing else is running on the box (and probably a few more caveats) using all the CPU's (e.g. N threads) is probably the best idea.
The best choice is probably to make it configurable and profile it and see what happens.
您应该使用某种线程池,因此可以(相当)轻松地调整线程数量,而不会影响程序的结构。
一旦完成此操作,就可以通过相当简单的测试来找到相对于可用处理器数量的合理最佳线程数。有可能的是,即使当/如果它们看起来应该是纯粹的 CPU 限制,当线程数 > N 时,您将获得更好的效率,但唯一确定的方法是测试。
You should use a thread pool of some sort, so it's (reasonably) easy to tune the number of threads without affecting the structure of the program.
Once you've done that, it's a fairly simple matter of testing to find a reasonably optimal number of threads relative to the number of processors available. Chances are that even when/if they look like this should be purely CPU bound, you'll get better efficiency with the number of threads >N, but about the only way to be sure is to test.