scrapy imagepipline:创建了简单的示例,运行/测试它的命令是什么

发布于 2025-01-07 14:22:58 字数 1044 浏览 2 评论 0原文

我按照这个文档使用 scrapy 下载图像。 http://doc.scrapy.org/en/latest/topics/images.html具体来说

,我会有这个 test.py:

from scrapy.contrib.pipeline.images import ImagesPipeline
from scrapy.exceptions import DropItem
from scrapy.http import Request

from My.items import ImageItem

item = ImageItem()
item['image_urls'] = ['http://url/123.jpg']

class MySpider(ImagesPipeline):

    def get_media_requests(self, item, info):
        for image_url in item['image_urls']:
            yield Request(image_url)

    def item_completed(self, results, item, info):
        image_paths = [x['path'] for ok, x in results if ok]
        if not image_paths:
            raise DropItem("Item contains no images")
        item['image_paths'] = image_paths
        return item

我的问题是:我应该运行什么命令行来测试这个 test.py 以验证图像是否已下载。

更多信息: 我知道命令“scrapycrawlproject_name”,但我更喜欢测试这个test.py,而不必创建项目。

还遇到“scrapy runningpider test.py”,但它不起作用。错误:未找到 MySpider。

I follow this docs to download images using scrapy.
http://doc.scrapy.org/en/latest/topics/images.html

Specifically, I would have this test.py:

from scrapy.contrib.pipeline.images import ImagesPipeline
from scrapy.exceptions import DropItem
from scrapy.http import Request

from My.items import ImageItem

item = ImageItem()
item['image_urls'] = ['http://url/123.jpg']

class MySpider(ImagesPipeline):

    def get_media_requests(self, item, info):
        for image_url in item['image_urls']:
            yield Request(image_url)

    def item_completed(self, results, item, info):
        image_paths = [x['path'] for ok, x in results if ok]
        if not image_paths:
            raise DropItem("Item contains no images")
        item['image_paths'] = image_paths
        return item

My question is: what is command line I should run to test this test.py to verify if images are downloaded.

Further info:
I know the command "scrapy crawl project_name" but I prefer to test this test.py without having to create a project.

Also come across "scrapy runspider test.py" but it does not work. Error: MySpider not found.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

烂柯人 2025-01-14 14:22:58

我建议你遵循教程,因为你的脚本缺少一些重要的东西例如。 Request() 函数的回调。
本教程非常容易在 15 分钟内完成,并且将涵盖您目前缺少的一些方面。

为了使管道和中间件正常工作,您需要完整的 Scrapy 引擎。
我建议查看架构概述来了解一下完整的发动机工作。

I suggest you follow the tutorial, because your script is lacking some important things eg. the callback for your Request() function.
The tutorial is quite easy to complete in like 15 minutes, and will cover some aspects you are currently missing.

In order for pipelines and middlewares to work, you'd need the complete Scrapy engine.
I recommend to check out the Architecture Overview to get a feel of how the complete engine works.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文