Python scrapy 高手看过来

发布于 2022-09-02 14:35:47 字数 1951 浏览 25 评论 0

我现在有一个初始网址获得网页内容是:

http://a.com/q=boy&alias=aps

["boy",["boys clothes","boys shoes","boys toys","boys socks","boyfriend gifts","boys shorts","boys underwear","boys sandals","boys","boys baseball pants"],[{"nodes":[{"name":"Boys' Clothing","alias":"fashion-boys-clothing"},{"name":"Amazon Fashion","alias":"fashion-brands"},{"name":"Baby","alias":"baby-products"},{"name":"Baby Boys' Clothing & Shoes","alias":"fashion-baby-boys"}]},{},{},{},{},{},{},{},{},{}],[]]

而红色这一部分是我所需要抓取的部分:
同时也是下一次查找时所需要带上的参数,结果也类似下部分,
我要做的就是把所有红色部分的全部提取出来

["boy",["boys clothes","boys shoes","boys toys","boys socks","boyfriend gifts","boys shorts","boys underwear","boys sandals","boys","boys baseball pants"],[{"nodes":[{"name":"Boys' Clothing","alias":"fashion-boys-clothing"},{"name":"Amazon Fashion","alias":"fashion-brands"},{"name":"Baby","alias":"baby-products"},{"name":"Baby Boys' Clothing & Shoes","alias":"fashion-baby-boys"}]},{},{},{},{},{},{},{},{},{}],[]]

我的思路如下:

class MYItem(scrapy.Item):
    Keyword = scrapy.Field()
    Nodes = scrapy.Field()

class Spider(CrawlSpider):
    name = 'mySpider'
    allowed_domains = ['a.com']
    start_urls =  ['http://a.com/q=boy&alias=aps']

    def parse(self, response):
           #suggestvalueArr 得到这样一个字符串数组  ["boys clothes","boys shoes","boys toys","boys socks","boyfriend gifts","boys shorts","boys underwear","boys sandals","boys","boys baseball pants"]
            for sel in suggestvalueArr:
                item = MYItem()
                item['Keyword'] = sel
                item['Nodes'] = nodes
                yield item

            for sel in suggestvalueArr:
                tmpurl = "http://a.com&q=%s&search-alias=aps"%sel
                yield scrapy.Request(tmpurl, callback=self.parse)

我为什么感觉我的结果没有完全抓取完就结束了,有没有人看出问题所在了?谢谢了

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

落在眉间の轻吻 2022-09-09 14:35:48

tmpurl = "http://a.com&q=%s&search-alias=aps"%sel
你的sel数据有空格,你可以把你的url编码后再传入Request

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文