网络爬虫受 I/O 限制而不是 CPU 限制是什么意思?
我在 S/O 的一些答案中看到了这一点,其中指出编程语言对于爬虫来说并不那么重要,所以 C++ 相对于 Python 来说是多余的。有人可以用通俗易懂的语言解释一下这一点,以便对隐含的内容没有歧义吗?我们还赞赏对此处基本假设的澄清。
谢谢
I've seen this in some answers on S/O where the point is made that the programming language doesn't matter as much for a crawler and so C++ is overkill vs say Python. Can someone please explain this in layman's terms so that there's no ambiguity about what is implied? Clarification of the underlying assumption here is also appreciated.
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这意味着 I/O 是这里的瓶颈。到网络检索页面 (I/O) 的行为比分析页面 (CPU) 慢。
因此,将 CPU 位速度提高十倍对总体时间影响不大。另一方面,将 I/O 速度加倍会产生非常有益的效果,直到 CPU 开始成为瓶颈。
It means that I/O is the bottleneck here. The act of going out to the net to retrieve a page (I/O) is slower than analysing the page (CPU).
So, making the CPU bit ten times faster will have little effect on the overall time taken. On the other hand, doubling the I/O speed will have a very beneficial effect, right up to the point where CPU starts being the bottleneck.
这意味着程序需要更多的时间来读取和写入(通过磁盘或网络),然后才实际运行代码中的算法。 I/O 比大多数 CPU 慢得多,使用它通常会大大减慢程序速度。
It means that the program takes more time reading and writing (via disk or network) then it does actually running the algorithms in the code. I/O is vastly slower than most CPUs, and using it will usually slow down a program greatly.
需要补充的一件事是,在输入/输出操作期间,您的程序(除非编写得不好)不会主动使用 CPU,而是处于非活动状态(睡眠)。
One thing to add is that during Input/Output operations your program (unless poorly written) isn't actively using the CPU, it's in inactive state (sleep).