Googlebot(或任何其他高效的网络爬虫)是用哪种编程语言编写的?

发布于 2024-08-09 03:43:55 字数 153 浏览 9 评论 0原文

有谁知道 Googlebot 是用哪种编程语言编写的?

或者,更一般地说,高效的网络爬虫是用哪种语言编写的?

我见过很多 Java 语言,但在我看来,它并不是开发网络爬虫的最合适的语言,因为它产生了太多的开销(尝试使用 Heritrix 网络爬虫,它非常重)。

Does anyone know in which programming language the Googlebot was written?

Or, more generally, in which language are efficient web-crawlers written?

I've seen many in Java language, but it doesn't seem to me the most appropriate language to develop a web-crawler because it creates far too much overhead (tried with Heritrix web-crawler, and it's extremely heavy).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

╭⌒浅淡时光〆 2024-08-16 03:43:55

有根据的猜测是Python。他们雇用它的创建者。但是,我可以想象他们的爬虫可能是一个利用 MapReduce 的分布式应用程序,其中如果它实际上可能是一个 C/C++ 应用程序。

不过,这不是重点。您可以用多种不同的语言编写高效的网络爬虫,并且仍然得到相同的结果。即使是黄色或蓝色的锤子,锤子仍然会击中钉子。选择你最喜欢的颜色并正确使用它。

An educated guess is Python. They employ the creator of it. However, I can imagine that their crawler probably is a distributed app that takes advantage of MapReduce, in which case it might actually be a C/C++ application.

This is besides the point, though. You can write an efficient web-crawler in many different languages and still get the same result. A hammer will still hit a nail even if it is a yellow or blue hammer. Pick your favorite color and use it correctly.

相权↑美人 2024-08-16 03:43:55

最早的版本 Backrub 是用 Python 和 Java 编写的

The very earliest version, Backrub, was written in Python and Java.

恰似旧人归 2024-08-16 03:43:55

这可能会有所帮助。谷歌原创论文。

http://infolab.stanford.edu/~backrub/google.html

This might help. Original google Paper.

http://infolab.stanford.edu/~backrub/google.html

赢得她心 2024-08-16 03:43:55

不了解 GoogleBot(很可能是 C 或 Python),但 Java 和 .NET 中都有一些不错的。

更流行的开源选项之一是 Nutch(通常与 Lucene 一起使用)。

Nutch 本身是用 Java 编写的,并且相当高效。还有一个名为 Nutch.NET 的 .NET 端口。

Don't know about GoogleBot (Most likely C or Python) but there are some good ones out there in both Java and .NET.

One of the more popular open source options is Nutch (often used with Lucene).

Nutch itself is writting in Java and is fairly efficient. There's also a .NET port called Nutch.NET.

乖不如嘢 2024-08-16 03:43:55

我认为语言并不像具体的实现那么重要。

您担心 Java 中的哪些开销?内存、处理能力?

I don't think the language will matter as much as the specific implementation.

What kind of overhead are you worried about in Java? memory, processing power?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文