phantomjs 内存泄漏的问题

发布于 2022-09-04 20:26:43 字数 2970 浏览 15 评论 0

各位好,

菜鸟这里想通过phantomjs + scrapy爬取网站,但发现随着爬取页面的增长,phantomjs 的内存使用量也一直增加直到内存耗尽,搜了一圈无果。现在简单想法就是每爬取一个网站就把phantomjs 给quit掉,比如直接这样放好像不行,

self.browser.get(response.url)
sel = self.browser.find_element_by_xpath("//pre").text
self..browser.quit()

直接报错,恳求帮忙下

Traceback (most recent call last):
  File "/usr/local/pythonbrew/venvs/Python-2.7.10/flask/lib/python2.7/site-packages/scrapy/utils/defer.py", line 102, in iter_errback
    yield next(it)
  File "/usr/local/pythonbrew/venvs/Python-2.7.10/flask/lib/python2.7/site-packages/scrapy/spidermiddlewares/offsite.py", line 29, in process_spider_output
    for x in result:
  File "/usr/local/pythonbrew/venvs/Python-2.7.10/flask/lib/python2.7/site-packages/scrapy/spidermiddlewares/referer.py", line 22, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "/usr/local/pythonbrew/venvs/Python-2.7.10/flask/lib/python2.7/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/usr/local/pythonbrew/venvs/Python-2.7.10/flask/lib/python2.7/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/app/Project/scrapy/new_stock/new_stock/spiders/newstock.py", line 86, in parse_items
    self.browser.get(response.url)
  File "/usr/local/pythonbrew/venvs/Python-2.7.10/flask/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 250, in get
    self.execute(Command.GET, {'url': url})
  File "/usr/local/pythonbrew/venvs/Python-2.7.10/flask/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 236, in execute
    response = self.command_executor.execute(driver_command, params)
  File "/usr/local/pythonbrew/venvs/Python-2.7.10/flask/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 415, in execute
    return self._request(command_info[0], url, body=data)
  File "/usr/local/pythonbrew/venvs/Python-2.7.10/flask/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 489, in _request
    resp = opener.open(request, timeout=self._timeout)
  File "/usr/local/pythonbrew/pythons/Python-2.7.10/lib/python2.7/urllib2.py", line 431, in open
    response = self._open(req, data)
  File "/usr/local/pythonbrew/pythons/Python-2.7.10/lib/python2.7/urllib2.py", line 449, in _open
    '_open', req)
  File "/usr/local/pythonbrew/pythons/Python-2.7.10/lib/python2.7/urllib2.py", line 409, in _call_chain
    result = func(*args)
  File "/usr/local/pythonbrew/pythons/Python-2.7.10/lib/python2.7/urllib2.py", line 1227, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/local/pythonbrew/pythons/Python-2.7.10/lib/python2.7/urllib2.py", line 1197, in do_open
raise URLError(err)

URLError: <urlopen error [Errno 111] Connection refused>

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

我最亲爱的 2022-09-11 20:26:43
try:
    self.browser.get(response.url)
    sel = self.browser.find_element_by_xpath("//pre").text
finally:
    self.browser.quit()

要保证即使异常browser也要quit

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文