如何用scrapy解析多个页面

发布于 2024-12-28 01:25:47 字数 1099 浏览 1 评论 0原文

我不断收到错误:无效的语法

1.add_xpath('tagLine', '//p[@class="tagline"]/text()')

,我似乎无法弄清楚为什么它会给我这个错误,因为据我所知,它与所有其他 1.add_xpath() 方法的语法相同。我的另一个问题是如何请求其他页面。基本上我正在浏览一个大页面并让它浏览页面上的每个链接,然后一旦页面完成,我希望它转到下一个大页面的下一个(按钮),但我不知道如何做到这一点。

   def parse(self, response):
    hxs = HtmlXPathSelector(response)
    for url in hxs.select('//a[@class="title"]/@href').extract():
        yield Request(url, callback=self.description_page)
    for url_2 in hxs.select('//a[@class="POINTER"]/@href').extract():
        yield Request(url_2, callback=self.description_page)

def description_page(self, response):
    l = XPathItemLoader(item=TvspiderItem(), response=response)
    l.add_xpath('title', '//div[@class="m show_head"]/h1/text()')
    1.add_xpath('tagLine', '//p[@class="tagline"]/text()')
    1.add_xpath('description', '//div[@class="description"]/span')
    1.add_xpath('rating', '//div[@class="score"]/text()')
    1.add_xpath('imageSrc', '//div[@class="image_bg"]/img/@src')
    return l.load_item()

对此的任何帮助将不胜感激。对于 python 和 scrapy 来说,我还是一个菜鸟。

I keep getting an error: invaled syntax for

1.add_xpath('tagLine', '//p[@class="tagline"]/text()')

and I cannot seem to figure out why it is giving me that error, since as far as i can tell it is the same syntax as all of the other 1.add_xpath() methods. my other question is how do I request other pages. basically I am going through one big page and having it go through each link on the page, then once it is done with the page I want it to go to the next (button) for the next large page, but I don't know how to do that.

   def parse(self, response):
    hxs = HtmlXPathSelector(response)
    for url in hxs.select('//a[@class="title"]/@href').extract():
        yield Request(url, callback=self.description_page)
    for url_2 in hxs.select('//a[@class="POINTER"]/@href').extract():
        yield Request(url_2, callback=self.description_page)

def description_page(self, response):
    l = XPathItemLoader(item=TvspiderItem(), response=response)
    l.add_xpath('title', '//div[@class="m show_head"]/h1/text()')
    1.add_xpath('tagLine', '//p[@class="tagline"]/text()')
    1.add_xpath('description', '//div[@class="description"]/span')
    1.add_xpath('rating', '//div[@class="score"]/text()')
    1.add_xpath('imageSrc', '//div[@class="image_bg"]/img/@src')
    return l.load_item()

any help on this would be greatly appreciated. I am still a bit of a noob when it comes to python and scrapy.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

放低过去 2025-01-04 01:25:47
def description_page(self, response):
    l = XPathItemLoader(item=TvspiderItem(), response=response)
    l.add_xpath('title', '//div[@class="m show_head"]/h1/text()')
    1.add_xpath('tagLine', '//p[@class="tagline"]/text()')
    1.add_xpath('description', '//div[@class="description"]/span')
    1.add_xpath('rating', '//div[@class="score"]/text()')
    1.add_xpath('imageSrc', '//div[@class="image_bg"]/img/@src')
    return l.load_item()

您使用数字 1 而不是变量名称 l

def description_page(self, response):
    l = XPathItemLoader(item=TvspiderItem(), response=response)
    l.add_xpath('title', '//div[@class="m show_head"]/h1/text()')
    1.add_xpath('tagLine', '//p[@class="tagline"]/text()')
    1.add_xpath('description', '//div[@class="description"]/span')
    1.add_xpath('rating', '//div[@class="score"]/text()')
    1.add_xpath('imageSrc', '//div[@class="image_bg"]/img/@src')
    return l.load_item()

You have digit 1 instead of variable name l.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文