Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 11 years ago.
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(3)
我过去使用过 WWW-Mechanize 来实现基本的网络爬行功能,包括表单提交等。
有一些非常好的示例。
I've used WWW-Mechanize in the past to achieve the basic web crawling functionality, including form submission and the like.
There are some pretty good examples.
这些应该几乎涵盖了您正在寻找的所有内容:
http://www .perl.com/pub/2002/08/20/perlandlwp.html
http://lwp.interglacial.com/
http://www.perl.com/pub/2003/01/22/mechanize.html
http://gd.tuwien.ac.at/linux/ldp /LDP/LGNET/108/oregan2.html
These should pretty much cover everything you're looking for:
http://www.perl.com/pub/2002/08/20/perlandlwp.html
http://lwp.interglacial.com/
http://www.perl.com/pub/2003/01/22/mechanize.html
http://gd.tuwien.ac.at/linux/ldp/LDP/LGNET/108/oregan2.html
除了 Perl 之外,您还需要的工具:
WWW::Mechanize
模块。HTML::TreeBuilder
&特别是HTML::TreeBuilder::XPath
和HTML::Query
。当您想要从 HTML 文档获取实际数据时,后两个将变得非常方便。HTML::TableExtract
也是一个很好的模块,可以在需要时从 HTML 表中提取数据。基本上,使用以上所有内容将使您能够抓取大多数网站。
享受爬行的乐趣(-:
Tools you will need besides Perl:
WWW::Mechanize
module.HTML::TreeBuilder
& especiallyHTML::TreeBuilder::XPath
andHTML::Query
. the last two will become very handy when you will want to get actual data from HTML documents.HTML::TableExtract
is also a nice module to extract data from HTML tables when needed.basically, using all of the above will give you the ability to crawl most sites.
Have fun crawling (-: