与用户空间上下文切换相比,内核上下文切换有多昂贵?
根据 C10k 和 本文,随着越来越多的客户端连接和创建越来越多的线程,每个连接 1 线程的服务器的吞吐量会下降。根据这两个消息来源,这是因为存在的线程越多,与这些线程完成的实际工作相比,在上下文切换上花费的时间就越多。在高连接数下,事件服务器似乎不会遭受太多性能下降的影响。
然而,事件服务器也在客户端之间进行上下文切换,它们只是在用户空间中进行。
- 为什么这些用户空间上下文切换比内核线程上下文切换更快?
- 内核上下文切换到底做了什么,成本如此之高?
- 内核上下文切换到底有多昂贵?需要多少时间?
- 内核上下文切换时间取决于线程数量吗?
我最感兴趣的是 Linux 内核如何处理上下文切换,但也欢迎有关其他操作系统的信息。
According to C10k and this paper, throughput of 1-thread-per-connection servers degrade as more and more clients connect and more and more threads are created. According to those two sources, this is because the more threads exist, the more time is spent on context switching compared to actual work done by those threads. Evented servers don't seem to suffer as much from performance degredation at high connection counts.
However, evented servers also do context switches between clients, they just do it in userspace.
- Why are these userspace context switches faster than kernel thread context switches?
- What exactly does a kernel context switch do that's so much more expensive?
- How expensive is a kernel context switch exactly? How much time does it take?
- Does kernel context switching time depend on the number of threads?
I'm mostly interested in how the Linux kernel handles context switching but information about other OSes is welcome too.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
因为CPU不需要切换到内核态又回到用户态。
主要是切换到内核模式。 IIRC,Linux中内核模式和用户模式的页表是相同的,所以至少没有TLB失效的惩罚。
需要测量并且可能因机器而异。我猜现在典型的桌面/服务器机器每秒可以进行几十万次上下文切换,可能是几百万次。
取决于内核调度程序如何处理这个问题。 AFAIK,在 Linux 中,即使线程数很大,它也非常高效,但是更多的线程意味着更多的内存使用,意味着更多的缓存压力,因此可能会降低性能。我还预计处理数千个套接字会涉及一些开销。
Because the CPU does not need to switch to kernel mode and back to user mode.
Mostly the switch to kernel mode. IIRC, the page tables are the same in kernel mode and user mode in Linux, so at least there is no TLB invalidation penalty.
Needs to be measured and can vary from machine to machine. I guess that a typical desktop/server machine these days can do a few hundred thousands of context switches per second, probably a few million.
Depends on how the kernel scheduler handles this. AFAIK, in Linux it is pretty efficient, even with large thread counts, but more threads means more memory usage means more cache pressure and thus likely lower performance. I also expect some overhead involved in the handling of thousands of sockets.