pyspider 抛错list index out of range
track.fetch 95.33ms
{
"content": "\n<!DOCTYPE html>\r\n<html lang=\"zh-CN\">\r\n<head>\r\n <meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />\r\n <meta http-equiv=\"mobile-agent\" content=\"format=xhtml; url=http://m.58.com/bj/shangpu/31864319246513x.shtml\">\r\n <meta http-equiv=\"mobile-agent\" content=\"format=html5; url=http://m.58.com/bj/shangpu/31864319246513x.shtml\">\r\n <meta http-equiv=\"mobile-agent\" content=\"format=wml; url=http://m.58.com/bj/shangpu/31864319246513x.shtml\">\r\n <title>【6图】出租西城广安门外临街门面-西城广安门外商",
"encoding": "UTF-8",
"error": null,
"headers": {
"Connection": "keep-alive",
"Content-Encoding": "gzip",
"Content-Type": "text/html; charset=UTF-8",
"Date": "Thu, 26 Oct 2017 09:04:55 GMT",
"P3p": "policyref=\"/w3c/p3p.xml\", CP=\"CUR ADM OUR NOR STA NID\"",
"Server": "Tengine",
"Set-Cookie": "f=n,id58=c5/ns1nxpTcCoRPtBDaUAg==; expires=Sat, 26-Oct-19 09:04:55 GMT; domain=58.com; path=/",
"Transfer-Encoding": "chunked",
"Vary": "Accept-Encoding",
"X-Http-Reason": "OK",
"X-Powered-By": "PHP/5.6.20"
},
"ok": true,
"redirect_url": null,
"status_code": 200,
"time": 0.09533166885375977
}
#==========================================
list index out of range
[E 171026 17:04:55 base_handler:195] list index out of range
Traceback (most recent call last):
File "/data/pyspider/project/pyspider-lepu/pyspider/libs/base_handler.py", line 188, in run_task
result = self._run_task(task, response)
File "/data/pyspider/project/pyspider-lepu/pyspider/libs/base_handler.py", line 168, in _run_task
return self._run_func(function, response, task)
File "/data/pyspider/project/pyspider-lepu/pyspider/libs/base_handler.py", line 150, in _run_func
return function(*arguments[:len(args) - 1])
File "<wuba_chuzu_chushou>", line 111, in detail_page
IndexError: list index out of range
{
"exception": "list index out of range",
"follows": 0,
"logs": "[E 171026 17:04:55 base_handler:195] list index out of range\n Traceback (most recent call last):\n File \"/data/pyspider/project/pyspider-lepu/pyspider/libs/base_handler.py\", line 188, in run_task\n result = self._run_task(task, response)\n File \"/data/pyspider/project/pyspider-lepu/pyspider/libs/base_handler.py\", line 168, in _run_task\n return self._run_func(function, response, task)\n File \"/data/pyspider/project/pyspider-lepu/pyspider/libs/base_handler.py\", line 150, in _run_func\n return function(*arguments[:len(args) - 1])\n File \"<wuba_chuzu_chushou>\", line 111, in detail_page\n IndexError: list index out of range\n",
"ok": false,
"result": null,
"time": 0.013800621032714844
}
fetch
{
"headers": {
"Referer": "http://bj.58.com/shangpucz/0/pn2/",
"User-Agent": "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36"
},
"save": {
"shop_type": "商铺出租"
}
}
schedule
{
"exetime": 1509095095.64874,
"itag": "2015-07-30 17:17",
"retried": 5,
"retries": 5
}
在挂上代理之后,公司爬虫突然抛出这样的错误,所有详情页都不能抓取
配置如下:
class Handler(BaseHandler):
default_header_config = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36'
}
crawl_config = {
"proxy":"123.193.33.233:3128",
"itag": '2015-07-30 17:17',
"retries": 5
}
process
{
"callback": "detail_page"
}
请各位大神救命
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
File "<wuba_chuzu_chushou>", line 111, in detail_page
这里不写了吗