Googlebot(或任何其他高效的网络爬虫)是用哪种编程语言编写的?
有谁知道 Googlebot 是用哪种编程语言编写的?
或者,更一般地说,高效的网络爬虫是用哪种语言编写的?
我见过很多 Java 语言,但在我看来,它并不是开发网络爬虫的最合适的语言,因为它产生了太多的开销(尝试使用 Heritrix 网络爬虫,它非常重)。
Does anyone know in which programming language the Googlebot was written?
Or, more generally, in which language are efficient web-crawlers written?
I've seen many in Java language, but it doesn't seem to me the most appropriate language to develop a web-crawler because it creates far too much overhead (tried with Heritrix web-crawler, and it's extremely heavy).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
有根据的猜测是Python。他们雇用它的创建者。但是,我可以想象他们的爬虫可能是一个利用 MapReduce 的分布式应用程序,其中如果它实际上可能是一个 C/C++ 应用程序。
不过,这不是重点。您可以用多种不同的语言编写高效的网络爬虫,并且仍然得到相同的结果。即使是黄色或蓝色的锤子,锤子仍然会击中钉子。选择你最喜欢的颜色并正确使用它。
An educated guess is Python. They employ the creator of it. However, I can imagine that their crawler probably is a distributed app that takes advantage of MapReduce, in which case it might actually be a C/C++ application.
This is besides the point, though. You can write an efficient web-crawler in many different languages and still get the same result. A hammer will still hit a nail even if it is a yellow or blue hammer. Pick your favorite color and use it correctly.
最早的版本 Backrub 是用 Python 和 Java 编写的。
The very earliest version, Backrub, was written in Python and Java.
这可能会有所帮助。谷歌原创论文。
http://infolab.stanford.edu/~backrub/google.html
This might help. Original google Paper.
http://infolab.stanford.edu/~backrub/google.html
不了解 GoogleBot(很可能是 C 或 Python),但 Java 和 .NET 中都有一些不错的。
更流行的开源选项之一是 Nutch(通常与 Lucene 一起使用)。
Nutch 本身是用 Java 编写的,并且相当高效。还有一个名为 Nutch.NET 的 .NET 端口。
Don't know about GoogleBot (Most likely C or Python) but there are some good ones out there in both Java and .NET.
One of the more popular open source options is Nutch (often used with Lucene).
Nutch itself is writting in Java and is fairly efficient. There's also a .NET port called Nutch.NET.
我认为语言并不像具体的实现那么重要。
您担心 Java 中的哪些开销?内存、处理能力?
I don't think the language will matter as much as the specific implementation.
What kind of overhead are you worried about in Java? memory, processing power?