阻塞 IO 如何影响 Linux 中的多线程应用程序/服务
我正在探索 Linux 上 C 语言网络爬虫的几个概念。为了决定是否使用阻塞 IO、多路复用 OI、AIO、某种组合等,我特别需要知道(我可能应该通过一些测试代码为自己实际发现它,但为了方便起见,我更喜欢从其他人那里了解) )当在阻塞模式下调用 IO 时,是特定线程(假设是多线程 app/svc)还是整个进程本身被阻塞?更具体地说,在多线程(POSIX)应用程序/服务中,专用于远程读/写的线程是否可以阻止整个进程?如果是这样,如何在不终止整个进程的情况下解锁这样的线程?
注意:我是否应该使用阻塞/非阻塞并不是真正的问题。
亲切地
Am exploring with several concepts for a web crawler in C on Linux. To decide if i'll use blocking IO, multiplexed OI, AIO, a certain combination, etc., I esp need to know (I probably should discover it for myself practically via some test code, but for expediency I prefer to know from others) when a call to IO in blocking mode is made, is it the particular thread (assuming a multithreaded app/svc) or the whole process itself that is blocked? Even more specifically, in a multitheaded (POSIX) app/service can a thread dedicated to remote read/writes block the entire process? If so, how can I unblock such a thread without terminating the entire process?
NB: Whether or not I should use blocking/nonblocking is not really the question here.
Kindly
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
阻塞调用仅阻塞发出它们的线程,而不是整个进程。
无论是使用阻塞 I/O(每个线程一个套接字)还是非阻塞 I/O(每个线程管理多个套接字),您都必须进行基准测试。但根据经验......
Linux 可以相当有效地处理多个线程。因此,如果您只处理几十个套接字,则为每个套接字使用一个线程很容易编码并且应该表现良好。如果您要处理数百个套接字,那就更困难了。对于数千个套接字,几乎肯定使用一个线程(或进程)来管理大型组会更好。
在后一种情况下,为了获得最佳性能,您可能需要使用 epoll,即使它是 Linux 特定的。
Blocking calls block only the thread that made them, not the entire process.
Whether to use blocking I/O (with one socket per thread) or non-blocking I/O (with each thread managing multiple sockets) is something you are going to have to benchmark. But as a rule of thumb...
Linux handles multiple threads reasonably efficiently. So if you are only handling a few dozen sockets, using one thread for each is easy to code and should perform well. If you are handling hundreds of sockets, it is a closer call. And for thousands of sockets, you are almost certainly better off using one thread (or process) to manage large groups.
In the latter case, for optimal performance you probably want to use
epoll
, even though it is Linux-specific.