未找到针对类型的对象的适配器:' itemadapter.adapter.itemadapter'

发布于 2025-02-13 19:01:33 字数 2641 浏览 0 评论 0原文

我想更改从网页下载的图像的名称。我想使用网站给出的标准名称,而不是清洁请求URL。

我有以下管道。py

from itemadapter import ItemAdapter
from scrapy.pipelines.images import ImagesPipeline

class ScrapyExercisesPipeline:
    def process_item(self, item, spider):
        adapter = ItemAdapter(item)
        return adapter

class DownfilesPipeline(ImagesPipeline):
    def file_path(self, request, response=None, info=None, item=None):
        adapter = ScrapyExercisesPipeline().process_item()[0]
        image_name: str = f'{adapter}.jpg'
        return image_name

这会产生以下错误:

提高typeError(f“未针对类型的对象找到的未适配器:{type(item)}({item})”) TypeError:未找到针对类型的对象的适配器:< class'itemadapter.adapter.itemadapter'> gt; (<用于scrapyecersisitem的itemadapter(name ='unknown267',图像= ['https://bl-web-assets.britishland.com/live/meadowhall/s3fs-public/styles/styles/retailer_thumbnail/public/retailer/retailer/boot s_1.jpg?qq.nhrs04tdmgxoyzkerrhcimb3jh& iTok = pd5lxlms&amp = 1657061667-curtime& v = 1657061667-curtime

scraper.py:settings.py

import scrapy
from scrapy_exercises.items import ScrapyExercisesItem

class TestSpider(scrapy.Spider):
    name = 'test'
    #allowed_domains = ['x']
    start_urls = ['https://www.meadowhall.co.uk/eatdrinkshop?page=1']

    def start_requests(self):
        for url in self.start_urls:
            yield scrapy.Request(
                url=url,
                callback=self.parse,
                cb_kwargs = {'pg':0}
            )
    def parse(self, response,pg):
        pg=0
        content_page = response.xpath("//div[@class='view-content']//div")
        for cnt in content_page:
            image_url = cnt.xpath(".//img//@src").get()
            image_name = cnt.xpath(".//img//@alt").get()
            if image_url != None:
                pg+=1
                items = ScrapyExercisesItem()
                if image_name == '':
                    items['name'] = 'unknown'+f'{pg}'
                    items['images'] = [image_url]
                    yield items
                else:
                    items['name'] = image_name
                    items['images'] = [image_url]
                    yield items

ITEM_PIPELINES = {
    #'scrapy.pipelines.images.ImagesPipeline': 1,
    'scrapy_exercises.pipelines.ScrapyExercisesPipeline':45,
    'scrapy_exercises.pipelines.DownfilesPipeline': 55
    }
from pathlib import Path
import os
BASE_DIR = Path(__file__).resolve().parent.parent
IMAGES_STORE = os.path.join(BASE_DIR, 'images')
IMAGES_URLS_FIELD = 'images'
IMAGES_RESULT_FIELD = 'results'

I want to change the names of images downloaded from a webpage. I want to use standard names given by the website as opposed to cleaning the request url for it.

I have the following pipeline.py

from itemadapter import ItemAdapter
from scrapy.pipelines.images import ImagesPipeline

class ScrapyExercisesPipeline:
    def process_item(self, item, spider):
        adapter = ItemAdapter(item)
        return adapter

class DownfilesPipeline(ImagesPipeline):
    def file_path(self, request, response=None, info=None, item=None):
        adapter = ScrapyExercisesPipeline().process_item()[0]
        image_name: str = f'{adapter}.jpg'
        return image_name

This produces the following error:

raise TypeError(f"No adapter found for objects of type: {type(item)} ({item})")
TypeError: No adapter found for objects of type: <class 'itemadapter.adapter.ItemAdapter'> (<ItemAdapter for ScrapyExercisesItem(name='unknown267', images=['https://bl-web-assets.britishland.com/live/meadowhall/s3fs-public/styles/retailer_thumbnail/public/retailer/boots_1.jpg?qQ.NHRs04tdmGxoyZKerRHcrhCImB3JH&itok=PD5LxLmS&cb=1657061667-curtime&v=1657061667-curtime'])>)

scraper.py:

import scrapy
from scrapy_exercises.items import ScrapyExercisesItem

class TestSpider(scrapy.Spider):
    name = 'test'
    #allowed_domains = ['x']
    start_urls = ['https://www.meadowhall.co.uk/eatdrinkshop?page=1']

    def start_requests(self):
        for url in self.start_urls:
            yield scrapy.Request(
                url=url,
                callback=self.parse,
                cb_kwargs = {'pg':0}
            )
    def parse(self, response,pg):
        pg=0
        content_page = response.xpath("//div[@class='view-content']//div")
        for cnt in content_page:
            image_url = cnt.xpath(".//img//@src").get()
            image_name = cnt.xpath(".//img//@alt").get()
            if image_url != None:
                pg+=1
                items = ScrapyExercisesItem()
                if image_name == '':
                    items['name'] = 'unknown'+f'{pg}'
                    items['images'] = [image_url]
                    yield items
                else:
                    items['name'] = image_name
                    items['images'] = [image_url]
                    yield items

settings.py

ITEM_PIPELINES = {
    #'scrapy.pipelines.images.ImagesPipeline': 1,
    'scrapy_exercises.pipelines.ScrapyExercisesPipeline':45,
    'scrapy_exercises.pipelines.DownfilesPipeline': 55
    }
from pathlib import Path
import os
BASE_DIR = Path(__file__).resolve().parent.parent
IMAGES_STORE = os.path.join(BASE_DIR, 'images')
IMAGES_URLS_FIELD = 'images'
IMAGES_RESULT_FIELD = 'results'

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

情释 2025-02-20 19:01:34

您正在从管道中呼叫管道,而该管道也在设置中注册,以作为管道运行。只需从downfilespipeline中的项目中提取name字段并将其返回就会更简单。

将您的pipelines.py文件更改为:

from itemadapter import ItemAdapter
from scrapy.pipelines.images import ImagesPipeline

class DownfilesPipeline(ImagesPipeline):
    def file_path(self, request, response=None, info=None, item=None):
        return item['name'] + '.jpg'

您还需要在设置中关闭scrapy> scrapyEscisespipeline

You are calling on a pipeline from within your pipeline while that pipeline is also registered in your settings to be run as a pipeline. It would be simpler to just extract the name field from your item in your DownFilesPipeLine and return it.

Change your pipelines.py file to:

from itemadapter import ItemAdapter
from scrapy.pipelines.images import ImagesPipeline

class DownfilesPipeline(ImagesPipeline):
    def file_path(self, request, response=None, info=None, item=None):
        return item['name'] + '.jpg'

You also need to turn off the ScrapyExercisesPipeline in your settings

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文