用于爬行和数据挖掘网站的最佳开源库或应用程序

发布于 2024-07-18 07:05:15 字数 91 浏览 9 评论 0原文

我想知道用于爬行和分析网站的最佳开源库是什么。 一个例子是爬虫房地产机构,我想从多个网站获取信息并将它们聚合到我自己的网站中。 为此,我需要抓取网站并提取房产广告。

I would like to know what is the best eopen-source library for crawling and analyzing websites. One example would be a crawler property agencies, where I would like to grab information from a number of sites and aggregate them into my own site. For this I need to crawl the sites and extract the property ads.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

嘿嘿嘿 2024-07-25 07:05:15

我使用优秀的 python 包进行了大量的抓取 urllib2, < a href="http://wwwsearch.sourceforge.net/mechanize/" rel="nofollow noreferrer">mechanize 和 BeautifulSoup.

我还建议查看 lxml 和 Scrapy,尽管我目前不使用它们(仍计划尝试 scrapy)。

Perl 语言还具有强大的抓取功能。

I do a lot of scraping, using excellent python packages urllib2, mechanize and BeautifulSoup.

I also suggest to look at lxml and Scrapy, though I don't use them currently (still planning to try out scrapy).

Perl language also has great facilities for scraping.

小糖芽 2024-07-25 07:05:15

PHP/cURL 是一个非常强大的组合,特别是如果您想直接在网页中使用结果......

PHP/cURL is a very powerful combination, especially if you want to use the results directly in a web page...

放低过去 2024-07-25 07:05:15

与莫罗佐夫先生一样,我也进行了大量的搜索工作,主要是工作地点。 我从来没有必要求助于机械化,如果这有帮助的话。 Beautifulsoup 与 urllib2 结合起来一直就足够了。

我用过lxml,非常棒。 不过,我相信几个月前我尝试使用它时,它可能还无法在 Google 应用程序中使用,如果您需要的话。

我要感谢 Morozov 先生提到 Scrapy。 没听说过。

In common with Mr Morozov I do quite a bit of scraping too, principally of job sites. I've never had to resort to mechanize, if that helps any. Beautifulsoup in combination with urllib2 have always been sufficient.

I have used lxml, which is great. However, I believe it may not have been available with Google apps a few months ago when I tried it, if you need that.

My thanks are due to Mr Morozov for mentioning Scrapy. Hadn't heard of it.

我不咬妳我踢妳 2024-07-25 07:05:15

除了 Scrapy 之外,你还应该看看 Parselets

Besides Scrapy, you should also look at Parselets

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文