上下文切换有多昂贵?实现手动任务切换是否比依赖操作系统线程更好?
想象一下我有两个(三个、四个等等)必须并行运行的任务。现在,执行此操作的简单方法是创建单独的线程并忘记它。但在普通的老式单核 CPU 上,这将意味着大量的上下文切换 - 我们都知道上下文切换很大、很糟糕、很慢,而且通常是邪恶的。应该避免吧?
在这一点上,如果我从头开始编写软件,我可以加倍努力并实现我自己的任务切换。将每个任务分成几部分,保存中间的状态,然后在单个线程中在它们之间切换。或者,如果我检测到有多个 CPU 核心,我可以将每个任务交给一个单独的线程,一切都会好起来的。
第二种解决方案确实具有适应可用CPU核心数量的优点,但是手动任务切换真的会比OS核心中的任务切换更快吗?特别是如果我试图使用 TaskManager
和 ITask
等使整个事情变得通用?
说明:我是一名 Windows 开发人员,因此我主要对此操作系统的答案感兴趣,但了解其他操作系统也是最有趣的。当您写下答案时,请说明它适用于哪个操作系统。
更多说明:好的,这不是在特定应用程序的上下文中。这确实是一个普遍的问题,是我对可扩展性思考的结果。如果我希望我的应用程序能够扩展并有效地利用未来的 CPU(甚至当今的不同 CPU),我必须使其成为多线程。但是有多少线程呢?如果我创建恒定数量的线程,那么程序在所有核心数量不同的 CPU 上的执行效果将不佳。
理想情况下,线程的数量将在运行时确定,但很少有任务能够真正在运行时拆分为任意数量的部分。然而,许多任务在设计时可以分成相当大的恒定数量的线程。因此,举例来说,如果我的程序可以生成 32 个线程,那么它就已经利用了最多 32 核 CPU 的所有核心,这在未来还很遥远(我认为)。但在简单的单核或双核 CPU 上,这意味着大量的上下文切换,这会减慢速度。
这就是我关于手动任务切换的想法。这样就可以创建 32 个“虚拟”线程,这些线程将被映射到尽可能多的实际线程,并且“上下文切换”将手动完成。问题只是 - 我的手动“上下文切换”的开销会小于操作系统上下文切换的开销吗?
当然,所有这些都适用于受 CPU 限制的进程,例如游戏。对于普通的 CRUD 应用程序来说,这没有什么价值。这样的应用程序最好用一个线程(最多两个)来构建。
Imagine I have two (three, four, whatever) tasks that have to run in parallel. Now, the easy way to do this would be to create separate threads and forget about it. But on a plain old single-core CPU that would mean a lot of context switching - and we all know that context switching is big, bad, slow, and generally simply Evil. It should be avoided, right?
On that note, if I'm writing the software from ground up anyway, I could go the extra mile and implement my own task-switching. Split each task in parts, save the state inbetween, and then switch among them within a single thread. Or, if I detect that there are multiple CPU cores, I could just give each task to a separate thread and all would be well.
The second solution does have the advantage of adapting to the number of available CPU cores, but will the manual task-switch really be faster than the one in the OS core? Especially if I'm trying to make the whole thing generic with a TaskManager
and an ITask
, etc?
Clarification: I'm a Windows developer so I'm primarily interested in the answer for this OS, but it would be most interesting to find out about other OSes as well. When you write your answer, please state for which OS it is.
More clarification: OK, so this isn't in the context of a particular application. It's really a general question, the result on my musings about scalability. If I want my application to scale and effectively utilize future CPUs (and even different CPUs of today) I must make it multithreaded. But how many threads? If I make a constant number of threads, then the program will perform suboptimally on all CPUs which do not have the same number of cores.
Ideally the number of threads would be determined at runtime, but few are the tasks that can truly be split into arbitrary number of parts at runtime. Many tasks however can be split in a pretty large constant number of threads at design time. So, for instance, if my program could spawn 32 threads, it would already utilize all cores of up to 32-core CPUs, which is pretty far in the future yet (I think). But on a simple single-core or dual-core CPU it would mean a LOT of context switching, which would slow things down.
Thus my idea about manual task switching. This way one could make 32 "virtual" threads which would be mapped to as many real threads as is optimal, and the "context switching" would be done manually. The question just is - would the overhead of my manual "context switching" be less than that of OS context switching?
Naturally, all this applies to processes which are CPU-bound, like games. For your run-of-the-mill CRUD application this has little value. Such an application is best made with one thread (at most two).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我不明白手动任务切换如何更快,因为操作系统内核仍在切换其他进程,包括您的进程也处于运行状态之外。看起来像是一个不成熟的优化,并且可能会浪费大量的精力。
如果系统不做任何其他事情,那么您很可能不会有大量的上下文切换。线程将使用其时间片,内核调度程序将看到没有其他需要运行并立即切换回您的线程。此外,操作系统将尽最大努力避免在 CPU 之间移动线程,这样您就可以从缓存中受益。
如果您确实受 CPU 限制,请检测 CPU 数量并启动那么多线程。您应该看到 CPU 利用率接近 100%。如果不是,则说明您没有完全受 CPU 限制,也许答案是启动 N + X 线程。对于 IO 密集型进程,您将启动 CPU 数量的(大)倍数(即高流量网络服务器运行 1000 多个线程)。
最后,作为参考,Windows 和 Linux 调度程序每毫秒都会唤醒一次,以检查是否需要运行另一个进程。因此,即使在空闲系统上,您每秒也会看到 1000 多次上下文切换。在负载较重的系统上,我发现每个 CPU 每秒超过 10,000 个,没有出现任何重大问题。
I don't see how a manual task switch could be faster since the OS kernel is still switching other processes, including yours in out of the running state too. Seems like a premature optimization and a potentially huge waste of effort.
If the system isn't doing anything else, chances are you won't have a huge number of context switches anyway. The thread will use its timeslice, the kernel scheduler will see that nothing else needs to run and switch right back to your thread. Also the OS will make a best effort to keep from moving threads between CPUs so you benefit there with caching.
If you are really CPU bound, detect the number of CPUs and start that many threads. You should see nearly 100% CPU utilization. If not, you aren't completely CPU bound and maybe the answer is to start N + X threads. For very IO bound processes, you would be starting a (large) multiple of the CPU count (i.e. high traffic webservers run 1000+ threads).
Finally, for reference, both Windows and Linux schedulers wake up every millisecond to check if another process needs to run. So, even on an idle system you will see 1000+ context switches per second. On heavily loaded systems, I have seen over 10,000 per second per CPU without any significant issues.
我认为手动切换的唯一优点是您可以更好地控制切换发生的位置和时间。理想的位置当然是在完成一个工作单元之后,这样您就可以将其全部丢弃。这可以避免缓存未命中。
我建议不要把精力花在这上面。
The only advantage of manual switch that I can see is that you have better control of where and when the switch happens. The ideal place is of course after a unit of work has been completed so that you can trash it all together. This saves you a cache miss.
I advise not to spend your effort on this.
单核 Windows 机器将在未来几年内消失,因此我通常会假设多核是常见情况来编写新代码。我想说采用操作系统线程管理,它将自动处理硬件现在和将来提供的任何并发性。
我不知道您的应用程序是做什么的,但除非您有多个计算密集型任务,否则我怀疑上下文切换是大多数应用程序中的一个重要瓶颈。如果您的任务在 I/O 上阻塞,那么您将不会从尝试超越操作系统中获得太多好处。
Single-core Windows machines are going to become extinct in the next few years, so I generally write new code with the assumption that multi-core is the common case. I'd say go with OS thread management, which will automatically take care of whatever concurrency the hardware provides, now and in the future.
I don't know what your application does, but unless you have multiple compute-bound tasks, I doubt that context switches are a significant bottleneck in most applications. If your tasks block on I/O, then you are not going to get much benefit from trying to out-do the OS.