如何将数据从一个管道发送到另一个管道中

发布于 2025-01-31 11:04:01 字数 1337 浏览 3 评论 0原文

您好,我有两个管道,第一个下载照片:

class ModelsPipeline(ImagesPipeline):
    def get_media_requests(self, item, info):
        for image_url in item['image_urls']:
            yield scrapy.Request(image_url)

    def file_path(self, request, response=None, info=None, *, item=None):
        image_url_hash = hashlib.shake_256(request.url.encode()).hexdigest(5)
        image_filename = f'{item["name"]}/{image_url_hash}.jpg'

        return image_filename

    def item_completed(self, results, item, info):
        image_paths = [x['path'] for ok, x in results if ok]

        for image in image_paths:
            file_extension = os.path.splitext(image)[1]
            img_path = f'{IMAGES_STORE}{image}'
            md5 = hashlib.md5(open(img_path, 'rb').read()).hexdigest()
            img_destination = f'{IMAGES_STORE}{item["name"]}/{md5}{file_extension}'
            os.rename(img_path, img_destination)

        return item

第二个是将以前的信息存储在数据库中的

class DatabasePipeline():

    def open_spider(self, spider):
        self.client = db_connect()

    def close_spider(self, spider):
        self.client.close()

    def process_item(self, item, spider):
        self.client.upsert(item)

Item_completed函数在第一个管道中,返回一个名称和我想发送到第二个管道的路径,以便存储在数据库中,但我无法访问该数据。

问题是我该怎么做?

谢谢

Hello I have two pipelines, the first one to download photos:

class ModelsPipeline(ImagesPipeline):
    def get_media_requests(self, item, info):
        for image_url in item['image_urls']:
            yield scrapy.Request(image_url)

    def file_path(self, request, response=None, info=None, *, item=None):
        image_url_hash = hashlib.shake_256(request.url.encode()).hexdigest(5)
        image_filename = f'{item["name"]}/{image_url_hash}.jpg'

        return image_filename

    def item_completed(self, results, item, info):
        image_paths = [x['path'] for ok, x in results if ok]

        for image in image_paths:
            file_extension = os.path.splitext(image)[1]
            img_path = f'{IMAGES_STORE}{image}'
            md5 = hashlib.md5(open(img_path, 'rb').read()).hexdigest()
            img_destination = f'{IMAGES_STORE}{item["name"]}/{md5}{file_extension}'
            os.rename(img_path, img_destination)

        return item

The second one is to store previous info in the database

class DatabasePipeline():

    def open_spider(self, spider):
        self.client = db_connect()

    def close_spider(self, spider):
        self.client.close()

    def process_item(self, item, spider):
        self.client.upsert(item)

The item_completed function in the first pipeline, returns a name and a path that I want to send to the second pipeline in order to store in the database, but I can not get access to that data.

The question is how can I do that?

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

爱你是孤单的心事 2025-02-07 11:04:01

您可以添加ModelsPipeline中该项目的名称和路径:

item['name_from_pipeline'] = name
item['path_from_pipeline'] = path
return item

在Databasepipeline的Process_item中,您可以访问它:

name = item['name_from_pipeline']
path = item['path_from_pipeline']

You can add the name and the path to the item in the ModelsPipeline:

item['name_from_pipeline'] = name
item['path_from_pipeline'] = path
return item

In process_item of the DatabasePipeline you can access it:

name = item['name_from_pipeline']
path = item['path_from_pipeline']
落花浅忆 2025-02-07 11:04:01

我最近遇到了同样的问题。
您需要启用两个管道并将较低的优先级分配给数据库epipeline,如以下内容。较高的数字意味着较低的优先级

因此,数据将首先由ModelsPipeline然后由DataBasepipeline处理。
切记在模型pipeline过程中返回该项目

ITEM_PIPELINES = {
   "project_name.pipelines.ModelsPipeline": 300, 
   "project_name.pipelines.DatabasePipeline": 302,
}

I run into the same problem recently.
You need to enable both pipelines and assign a lower priority to the DatabasePipeline like the following. Higher number means lower priority
.

So the data will be processed first by ModelsPipeline then by DatabasePipeline.
Remember to return the item inside the process of ModelsPipeline

ITEM_PIPELINES = {
   "project_name.pipelines.ModelsPipeline": 300, 
   "project_name.pipelines.DatabasePipeline": 302,
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文