pyspider 抛错list index out of range

发布于 2022-09-06 02:42:37 字数 3683 浏览 8 评论 0

track.fetch  95.33ms
{
  "content": "\n<!DOCTYPE html>\r\n<html lang=\"zh-CN\">\r\n<head>\r\n    <meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />\r\n    <meta http-equiv=\"mobile-agent\" content=\"format=xhtml; url=http://m.58.com/bj/shangpu/31864319246513x.shtml\">\r\n    <meta http-equiv=\"mobile-agent\" content=\"format=html5; url=http://m.58.com/bj/shangpu/31864319246513x.shtml\">\r\n    <meta http-equiv=\"mobile-agent\" content=\"format=wml; url=http://m.58.com/bj/shangpu/31864319246513x.shtml\">\r\n    <title>【6图】出租西城广安门外临街门面-西城广安门外商",
  "encoding": "UTF-8",
  "error": null,
  "headers": {
    "Connection": "keep-alive",
    "Content-Encoding": "gzip",
    "Content-Type": "text/html; charset=UTF-8",
    "Date": "Thu, 26 Oct 2017 09:04:55 GMT",
    "P3p": "policyref=\"/w3c/p3p.xml\", CP=\"CUR ADM OUR NOR STA NID\"",
    "Server": "Tengine",
    "Set-Cookie": "f=n,id58=c5/ns1nxpTcCoRPtBDaUAg==; expires=Sat, 26-Oct-19 09:04:55 GMT; domain=58.com; path=/",
    "Transfer-Encoding": "chunked",
    "Vary": "Accept-Encoding",
    "X-Http-Reason": "OK",
    "X-Powered-By": "PHP/5.6.20"
  },
  "ok": true,
  "redirect_url": null,
  "status_code": 200,
  "time": 0.09533166885375977
}

#==========================================

list index out of range
[E 171026 17:04:55 base_handler:195] list index out of range
    Traceback (most recent call last):
      File "/data/pyspider/project/pyspider-lepu/pyspider/libs/base_handler.py", line 188, in run_task
        result = self._run_task(task, response)
      File "/data/pyspider/project/pyspider-lepu/pyspider/libs/base_handler.py", line 168, in _run_task
        return self._run_func(function, response, task)
      File "/data/pyspider/project/pyspider-lepu/pyspider/libs/base_handler.py", line 150, in _run_func
        return function(*arguments[:len(args) - 1])
      File "<wuba_chuzu_chushou>", line 111, in detail_page
    IndexError: list index out of range

{
  "exception": "list index out of range",
  "follows": 0,
  "logs": "[E 171026 17:04:55 base_handler:195] list index out of range\n    Traceback (most recent call last):\n      File \"/data/pyspider/project/pyspider-lepu/pyspider/libs/base_handler.py\", line 188, in run_task\n        result = self._run_task(task, response)\n      File \"/data/pyspider/project/pyspider-lepu/pyspider/libs/base_handler.py\", line 168, in _run_task\n        return self._run_func(function, response, task)\n      File \"/data/pyspider/project/pyspider-lepu/pyspider/libs/base_handler.py\", line 150, in _run_func\n        return function(*arguments[:len(args) - 1])\n      File \"<wuba_chuzu_chushou>\", line 111, in detail_page\n    IndexError: list index out of range\n",
  "ok": false,
  "result": null,
  "time": 0.013800621032714844
}

fetch
{
  "headers": {
    "Referer": "http://bj.58.com/shangpucz/0/pn2/",
    "User-Agent": "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36"
  },
  "save": {
    "shop_type": "商铺出租"
  }
}
schedule
{
  "exetime": 1509095095.64874,
  "itag": "2015-07-30 17:17",
  "retried": 5,
  "retries": 5
}

在挂上代理之后,公司爬虫突然抛出这样的错误,所有详情页都不能抓取
配置如下:

class Handler(BaseHandler):
    default_header_config = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36'
    }

    crawl_config = {
        "proxy":"123.193.33.233:3128",
        "itag": '2015-07-30 17:17',
        "retries": 5
    }
    
process
{
  "callback": "detail_page"
}

请各位大神救命

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

情话难免假 2022-09-13 02:42:37

File "<wuba_chuzu_chushou>", line 111, in detail_page

这里不写了吗

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文