多线程等于更少的CPU吗?
我有一个要处理的相当大的文件的小列表,这让我想到...
在 C# 中,我正在考虑使用 TPL 的 Parallel.ForEach 来利用现代多核CPU,但我的问题更多的是假设性的;
在实际中使用多线程是否意味着并行加载文件需要更长的时间(使用尽可能多的 CPU 核心),而不是顺序加载每个文件(但 CPU 利用率可能较低)?
或者换一种方式来说(:
多线程的意义是什么?并行执行更多任务,但速度较慢,而不是一次将所有计算资源集中在一个任务上?
I have a small list of rather large files that I want to process, which got me thinking...
In C#, I was thinking of using Parallel.ForEach
of TPL to take advantage of modern multi-core CPUs, but my question is more of a hypothetical character;
Does the use of multi-threading in practicality mean that it would take longer time to load the files in parallel (using as many CPU-cores as possible), as opposed to loading each file sequentially (but with probably less CPU-utilization)?
Or to put it in another way (:
What is the point of multi-threading? More tasks in parallel but at a slower rate, as opposed to focusing all computing resources on one task at a time?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
对于从磁盘加载文件,这可能会使速度慢得多。发生的情况是操作系统尝试在磁盘上布置文件,这样您只需为每个文件执行一次昂贵的磁盘查找。如果你有很多线程读取很多文件,你就会争夺哪个线程可以访问磁盘,并且每次下一个线程获得一个文件时,你都必须返回到文件中的正确位置。转动。
你能做的就是使用两个线程。将其中一个设置为在后台加载所有文件,并让另一个继续执行其他任务,例如处理用户输入。在 C# winforms 中,您可以使用 BackgroundWorker 控件轻松完成此操作。
For loading files from disk, this is likely to make things much slower. What happens is the operating system tries to lay out files on disk such that you should only need to do an expensive disk seek once for each file. If you have a lot of threads reading a lot of files, you're gonna have contention over which thread has access to the disk, and you'll have to seek back to the right place in the file every time the next thread gets a turn.
What you can do is use exactly two threads. Set one to load all of the files in the background, and let the other remain available for other tasks, like handling user input. In C# winforms, you can do this easily with a BackgroundWorker control.
多线程对于高度并行化的任务很有用。 CPU 密集型任务是完美的。你的CPU有很多核心,很多线程可以使用很多核心。他们将使用更多的 CPU 时间,但最终他们将使用更少的“用户”时间。如果您的应用程序受 I/O 限制,那么多线程并不总是解决方案(但它可以提供帮助)
Multi-threading is useful for highly parallelizable tasks. CPU intensive tasks are perfect. Your CPU has many cores, many threads can use many cores. They'll use more CPU time, but in the end they'll use less "user" time. If your app is I/O bounded, then multithreading isn't always the solution (but it COULD help)
首先了解多线程和并行之间的区别可能会有所帮助,因为我经常看到它们可以互换使用。 Joseph Albahari 写了一篇关于这个主题的非常有趣的指南:C# 中的线程 - 第 5 部分 - 并行性
It might be helpful to first understand the difference between Multithreading and Parallelism, as more often than not I see them being used rather interchangeably. Joseph Albahari has written a quite interesting guide about the subject: Threading in C# - Part 5 - Parallelism
与所有伟大的编程工作一样,这取决于。总的来说,您将从一个物理存储或一个物理控制器请求文件,无论如何,该物理控制器都会序列化请求(或更糟糕的是,导致传统硬盘驱动器上出现大量磁头来回)并减慢已经存在的速度。缓慢的 I/O。
OTOH,如果控制器和介质是分开的,则从它们加载数据的多个核心应该比顺序方法得到改进。
As with all great programming endeavors, it depends. By and large, you'll be requesting files from one physical store, or one physical controller which will serialize the requests anyhow (or worse, cause a LOT of head back-and-forth on a classical hard drive) and slow down the already slow I/O.
OTOH, if the controllers and the medium are separate, multiple cores loading data from them should be improved over a sequential method.
为了不增加延迟,并行计算程序通常只为每个核心创建一个线程。非纯粹计算的应用程序倾向于添加更多线程,以便可运行线程的数量就是核心的数量(其他线程处于 I/O 等待状态,并且不竞争 CPU 时间)。
现在,磁盘 I/O 绑定程序上的并行性很可能会导致性能下降,如果磁盘具有不可忽略的寻道时间,那么将浪费更多的时间来执行寻道,而实际读取的时间会更少。这称为“搅动”或“颠簸”。电梯排序有一定帮助,真正的随机访问(例如固态存储器)有更多帮助。
并行性几乎总是会增加完成的原始工作总量,但这仅在电池寿命最重要的情况下才重要(并且当您考虑其他组件(例如屏幕背光)使用的功率时,更快地完成通常仍然更高效全面的)。
In order to not increase latency, parallel computational programs typically only create one thread per core. Applications which aren't purely computational tend to add more threads so that the number of runnable threads is the number of cores (the others are in I/O wait, and not competing for CPU time).
Now, parallelism on disk-I/O bound programs may well cause performance to decrease, if the disk has a non-negligible seek time then much more time will be wasted performing seeks and less time actually reading. This is called "churning" or "thrashing". Elevator sorting helps somewhat, true random access (such as solid state memories) helps more.
Parallelism does almost always increase the total raw work done, but this is only important if battery life is of foremost importance (and by the time you account for power used by other components, such as the screen backlight, completing quicker is often still more efficient overall).
您提出了多个问题,因此我将我的回答分解为多个答案:
多线程可能对加载速度没有影响,具体取决于加载过程中的瓶颈是什么。如果您要从磁盘或数据库加载大量数据,I/O 可能是您的限制因素。另一方面,如果“加载”涉及对某些数据进行大量 CPU 工作,则使用多线程可能会提高速度。
一般来说,您无法将“所有计算资源集中在一项任务上”。一些多核处理器能够对单个核心进行超频,以换取禁用其他核心,但这种速度提升并不等于通过多线程/多处理充分利用所有核心所获得的潜在性能优势。换句话说,它是不对称的——如果您有一个 4 核 1Ghz CPU,它将无法将单个核心超频至 4GHz,以换取禁用其他核心。事实上,这就是业界首先转向多核的原因——至少目前我们已经达到了单个 CPU 运行速度的极限,因此我们采取了添加更多 CPU 的方式。
使用多线程有两个原因。第一个是您希望任务同时运行,因为这两个任务能够同时发生——例如,您希望 GUI 在执行其他工作(事件循环)时继续响应点击或键盘按下不过,这是实现此目的的另一种方法)。第二个是利用多个核心来提高性能。
You asked multiple questions, so I've broken up my response into multiple answers:
Multithreading may have no effect on loading speed, depending on what your bottleneck during loading is. If you're loading a lot of data off disk or a database, I/O may be your limiting factor. On the other hand if 'loading' involves doing a lot of CPU work with some data, you may get a speed up from using multithreading.
Generally speaking you can't focus "all computing resources on one task." Some multicore processors have the ability to overclock a single core in exchange for disabling other cores, but this speed boost is not equal to the potential performance benefit you would get from fully utilizing all of the cores using multithreading/multiprocessing. In other words it's asymmetrical -- if you have a 4 core 1Ghz CPU, it won't be able to overclock a single core all the way to 4ghz in exchange for disabling the others. In fact, that's the reason the industry is going multicore in the first place -- at least for now we've hit limits on how fast we can make a single CPU run, so instead we've gone the route of adding more CPUs.
There are 2 reasons for multithreading. The first is that you want to tasks to run at the same time simply because it's desirable for both to be able to happen simultaneously -- e.g. you want your GUI to continue to respond to clicks or keyboard presses while it's doing other work (event loops are another way to accomplish this though). The second is to utilize multiple cores to get a performance boost.