我怎样才能停止一个scrapy CrawlSpider并稍后从它停止的地方恢复?
我有一个 Scrapy CrawlSpider ,它有一个非常大的要抓取的 URL 列表。我希望能够停止它,保存当前状态并稍后恢复,而不必重新开始。有没有办法在Scrapy框架内实现这一点?
I have a Scrapy CrawlSpider that has a very large list of URLs to crawl. I would like to be able to stop it, saving the current status and resume it later without having to start over. Is there a way to accomplish this within the Scrapy framework?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
只是想分享最新的 scrapy 版本中包含该功能,但参数名称已更改。您应该像这样使用它:
有关更多信息,请访问 http:// /doc.scrapy.org/en/latest/topics/jobs.html#job-directory
Just wanted to share that feature is included in latest scrapy version, but parameter name is changed. You should use it like this:
For more information here http://doc.scrapy.org/en/latest/topics/jobs.html#job-directory
几个月前有一个关于 ML 的问题:http ://groups.google.com/group/scrapy-users/browse_thread/thread/6a8df07daff723fc?pli=1
引用巴勃罗:
There was a question on the ML just few months ago: http://groups.google.com/group/scrapy-users/browse_thread/thread/6a8df07daff723fc?pli=1
Quote Pablo:
Scrapy 现在在其网站上提供了此功能,记录如下:
这是实际的命令:
Scrapy now has the working feature for this on their site documented here:
Here's the actual command: