如何将数据从一个管道发送到另一个管道中

发布于 2025-01-31 11:04:01 字数 1337 浏览 3 评论 0原文

您好，我有两个管道，第一个下载照片：

class ModelsPipeline(ImagesPipeline):
    def get_media_requests(self, item, info):
        for image_url in item['image_urls']:
            yield scrapy.Request(image_url)

    def file_path(self, request, response=None, info=None, *, item=None):
        image_url_hash = hashlib.shake_256(request.url.encode()).hexdigest(5)
        image_filename = f'{item["name"]}/{image_url_hash}.jpg'

        return image_filename

    def item_completed(self, results, item, info):
        image_paths = [x['path'] for ok, x in results if ok]

        for image in image_paths:
            file_extension = os.path.splitext(image)[1]
            img_path = f'{IMAGES_STORE}{image}'
            md5 = hashlib.md5(open(img_path, 'rb').read()).hexdigest()
            img_destination = f'{IMAGES_STORE}{item["name"]}/{md5}{file_extension}'
            os.rename(img_path, img_destination)

        return item

第二个是将以前的信息存储在数据库中的

class DatabasePipeline():

    def open_spider(self, spider):
        self.client = db_connect()

    def close_spider(self, spider):
        self.client.close()

    def process_item(self, item, spider):
        self.client.upsert(item)

Item_completed函数在第一个管道中，返回一个名称和我想发送到第二个管道的路径，以便存储在数据库中，但我无法访问该数据。

问题是我该怎么做？

谢谢

原文

Hello I have two pipelines, the first one to download photos:

class ModelsPipeline(ImagesPipeline):
    def get_media_requests(self, item, info):
        for image_url in item['image_urls']:
            yield scrapy.Request(image_url)

    def file_path(self, request, response=None, info=None, *, item=None):
        image_url_hash = hashlib.shake_256(request.url.encode()).hexdigest(5)
        image_filename = f'{item["name"]}/{image_url_hash}.jpg'

        return image_filename

    def item_completed(self, results, item, info):
        image_paths = [x['path'] for ok, x in results if ok]

        for image in image_paths:
            file_extension = os.path.splitext(image)[1]
            img_path = f'{IMAGES_STORE}{image}'
            md5 = hashlib.md5(open(img_path, 'rb').read()).hexdigest()
            img_destination = f'{IMAGES_STORE}{item["name"]}/{md5}{file_extension}'
            os.rename(img_path, img_destination)

        return item

The second one is to store previous info in the database

class DatabasePipeline():

    def open_spider(self, spider):
        self.client = db_connect()

    def close_spider(self, spider):
        self.client.close()

    def process_item(self, item, spider):
        self.client.upsert(item)

The item_completed function in the first pipeline, returns a name and a path that I want to send to the second pipeline in order to store in the database, but I can not get access to that data.

The question is how can I do that?

Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

爱你是孤单的心事 2025-02-07 11:04:01

您可以添加ModelsPipeline中该项目的名称和路径：

item['name_from_pipeline'] = name
item['path_from_pipeline'] = path
return item

在Databasepipeline的Process_item中，您可以访问它：

name = item['name_from_pipeline']
path = item['path_from_pipeline']

You can add the name and the path to the item in the ModelsPipeline:

item['name_from_pipeline'] = name
item['path_from_pipeline'] = path
return item

In process_item of the DatabasePipeline you can access it:

name = item['name_from_pipeline']
path = item['path_from_pipeline']

回复收藏 0 原文

落花浅忆 2025-02-07 11:04:01

我最近遇到了同样的问题。
您需要启用两个管道并将较低的优先级分配给数据库epipeline，如以下内容。较高的数字意味着较低的优先级
。

因此，数据将首先由ModelsPipeline然后由DataBasepipeline处理。
切记在模型pipeline过程中返回该项目

ITEM_PIPELINES = {
   "project_name.pipelines.ModelsPipeline": 300, 
   "project_name.pipelines.DatabasePipeline": 302,
}

I run into the same problem recently.
You need to enable both pipelines and assign a lower priority to the DatabasePipeline like the following. Higher number means lower priority
.

So the data will be processed first by ModelsPipeline then by DatabasePipeline.
Remember to return the item inside the process of ModelsPipeline

ITEM_PIPELINES = {
   "project_name.pipelines.ModelsPipeline": 300, 
   "project_name.pipelines.DatabasePipeline": 302,
}

回复收藏 0 原文

~没有更多了~

关于作者

盛夏尉蓝

暂无简介

文章

28 人气

关注发私信

友情链接

文江博客

如何将数据从一个管道发送到另一个管道中

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

冰之心

貪欢

好菇凉咱不稀罕他

guowei007

大海や

1KUPGZrJCxEwZ

友情链接

如何将数据从一个管道发送到另一个管道中

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

冰之心

貪欢

好菇凉咱不稀罕他

guowei007

大海や

1KUPGZrJCxEwZ

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。