scrapy imagepipline:创建了简单的示例,运行/测试它的命令是什么
我按照这个文档使用 scrapy 下载图像。 http://doc.scrapy.org/en/latest/topics/images.html具体来说
,我会有这个 test.py:
from scrapy.contrib.pipeline.images import ImagesPipeline
from scrapy.exceptions import DropItem
from scrapy.http import Request
from My.items import ImageItem
item = ImageItem()
item['image_urls'] = ['http://url/123.jpg']
class MySpider(ImagesPipeline):
def get_media_requests(self, item, info):
for image_url in item['image_urls']:
yield Request(image_url)
def item_completed(self, results, item, info):
image_paths = [x['path'] for ok, x in results if ok]
if not image_paths:
raise DropItem("Item contains no images")
item['image_paths'] = image_paths
return item
我的问题是:我应该运行什么命令行来测试这个 test.py 以验证图像是否已下载。
更多信息: 我知道命令“scrapycrawlproject_name”,但我更喜欢测试这个test.py,而不必创建项目。
还遇到“scrapy runningpider test.py”,但它不起作用。错误:未找到 MySpider。
I follow this docs to download images using scrapy.
http://doc.scrapy.org/en/latest/topics/images.html
Specifically, I would have this test.py:
from scrapy.contrib.pipeline.images import ImagesPipeline
from scrapy.exceptions import DropItem
from scrapy.http import Request
from My.items import ImageItem
item = ImageItem()
item['image_urls'] = ['http://url/123.jpg']
class MySpider(ImagesPipeline):
def get_media_requests(self, item, info):
for image_url in item['image_urls']:
yield Request(image_url)
def item_completed(self, results, item, info):
image_paths = [x['path'] for ok, x in results if ok]
if not image_paths:
raise DropItem("Item contains no images")
item['image_paths'] = image_paths
return item
My question is: what is command line I should run to test this test.py to verify if images are downloaded.
Further info:
I know the command "scrapy crawl project_name" but I prefer to test this test.py without having to create a project.
Also come across "scrapy runspider test.py" but it does not work. Error: MySpider not found.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我建议你遵循教程,因为你的脚本缺少一些重要的东西例如。 Request() 函数的回调。
本教程非常容易在 15 分钟内完成,并且将涵盖您目前缺少的一些方面。
为了使管道和中间件正常工作,您需要完整的 Scrapy 引擎。
我建议查看架构概述来了解一下完整的发动机工作。
I suggest you follow the tutorial, because your script is lacking some important things eg. the callback for your Request() function.
The tutorial is quite easy to complete in like 15 minutes, and will cover some aspects you are currently missing.
In order for pipelines and middlewares to work, you'd need the complete Scrapy engine.
I recommend to check out the Architecture Overview to get a feel of how the complete engine works.