scrapy抓取淘宝商品详情页,读取url随机强制302,跳转到h5.taobao。
使用scrapy+redis从一定量的淘宝详情页url获取商品详情
已设置user-agent,已传入cookie,已设置proxy-ip
获取url,response.status有时是200,有时是302,随机改变
1000个url,成功获取商品信息大概有400多
是否为cookie未传入成功,还是proxy-ip不稳定?或者其他原因。请帮忙分析,谢谢!
报错Traceback:
2017-07-14 15:51:12 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://item.taobao.com/item.htm?id=10245430841&ns=1&abbucket=0#detail> (referer: None)
2017-07-14 15:51:12 [requests.packages.urllib3.connectionpool] INFO: Starting new HTTPS connection (1): rate.taobao.com
2017-07-14 15:51:12 [requests.packages.urllib3.connectionpool] DEBUG: "GET /detailCommon.htm?auctionNumId=10245430841 HTTP/1.1" 200 None
2017-07-14 15:51:12 [scrapy.core.scraper] DEBUG: Scraped from <200 https://item.taobao.com/item.htm?id=10245430841&ns=1&abbucket=0>
None
2017-07-14 15:51:12 [taobao] DEBUG: Read 1 requests from 'taobao:start_urls'
2017-07-14 15:51:12 [scrapy.downloadermiddlewares.cookies] DEBUG: Sending cookies to: <GET https://item.taobao.com/item.htm?id=10245681616&ns=1&abbucket=0#detail>
2017-07-14 15:51:12 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET http://h5.m.taobao.com/awp/core/detail.htm?id=10245681616&ns=1&abbucket=0> from <GET https://item.taobao.com/it
em.htm?id=10245681616&ns=1&abbucket=0#detail>
2017-07-14 15:51:12 [scrapy.downloadermiddlewares.cookies] DEBUG: Sending cookies to: <GET http://h5.m.taobao.com/awp/core/detail.htm?id=10245681616&ns=1&abbucket=0>
2017-07-14 15:51:12 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://h5.m.taobao.com/awp/core/detail.htm?id=10245681616&ns=1&abbucket=0> (referer: None) ['partial']
2017-07-14 15:51:12 [scrapy.core.scraper] ERROR: Spider error processing <GET http://h5.m.taobao.com/awp/core/detail.htm?id=10245681616&ns=1&abbucket=0> (referer: None)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
已找到异常原因,导入user-agent里面有mobile端的ua,删除之后,就没问题了
自己更新了一个2017最新的ua_list(pc端)给大家:https://github.com/lovebaicai...
能发我一份淘宝的scrapy爬虫嘛?我现在也在爬淘宝有很多问题。希望老哥能分享代码学习,317729332@qq.com谢谢老哥