urllib2 中重复主机查找失败

发布于 2024-10-09 06:59:57 字数 2182 浏览 0 评论 0原文

我的代码使用 Python 的 urllib2 在多个线程中发出许多 HTTP GET 请求,将响应写入文件(每个线程一个)。
在执行过程中,看起来许多主机查找失败(导致名称或服务未知错误,请参阅附加的错误日志以获取示例)。

这是由于 DNS 服务不稳定造成的吗?如果主机名没有改变,依赖 DNS 缓存是不是不好的做法?即是否应该将单个查找的结果传递到 urlopen 中?

Exception in thread Thread-16:
Traceback (most recent call last):
  File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
    self.run()
  File "/home/da/local/bin/ThreadedDownloader.py", line 61, in run
     page = urllib2.urlopen(url) # get the page
  File "/usr/lib/python2.6/urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.6/urllib2.py", line 391, in open
    response = self._open(req, data)
  File "/usr/lib/python2.6/urllib2.py", line 409, in _open
    '_open', req)
  File "/usr/lib/python2.6/urllib2.py", line 369, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.6/urllib2.py", line 1170, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib/python2.6/urllib2.py", line 1145, in do_open
    raise URLError(err)
URLError: <urlopen error [Errno -2] Name or service not known>

更新 我的(极其简单)代码

class AsyncGet(threading.Thread):

def __init__(self,outDir,baseUrl,item,method,numPages,numRows,semaphore):
    threading.Thread.__init__(self)
    self.outDir = outDir
    self.baseUrl = baseUrl
    self.method = method
    self.numPages = numPages
    self.numRows = numRows
    self.item = item
    self.semaphore = semaphore

def run(self):
    with self.semaphore: # 'with' is awesome.
        with open( os.path.join(self.outDir,self.item+".xml"), 'a' ) as f:
            for i in xrange(1,self.numPages+1):
                url = self.baseUrl + \
                "method=" + self.method + \
                "&item=" + self.item + \
                "&page=" + str(i) + \
                "&rows=" + str(self.numRows) + \
                "&prettyXML"
                page = urllib2.urlopen(url)
                f.write(page.read())
                page.close() # Must remember to close!

信号量是一个 BoundedSemaphore,用于约束正在运行的线程总数。

I have code which issues many HTTP GET requests using Python's urllib2, in several threads, writing the responses into files (one per thread).
During execution, it looks like many of the host lookups fail (causing a name or service unknown error, see appended error log for an example).

Is this due to a flaky DNS service? Is it bad practice to rely on DNS caching, if the host name isn't changing? I.e. should a single lookup's result be passed into the urlopen?

Exception in thread Thread-16:
Traceback (most recent call last):
  File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
    self.run()
  File "/home/da/local/bin/ThreadedDownloader.py", line 61, in run
     page = urllib2.urlopen(url) # get the page
  File "/usr/lib/python2.6/urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.6/urllib2.py", line 391, in open
    response = self._open(req, data)
  File "/usr/lib/python2.6/urllib2.py", line 409, in _open
    '_open', req)
  File "/usr/lib/python2.6/urllib2.py", line 369, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.6/urllib2.py", line 1170, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib/python2.6/urllib2.py", line 1145, in do_open
    raise URLError(err)
URLError: <urlopen error [Errno -2] Name or service not known>

UPDATE my (extremely simple) code

class AsyncGet(threading.Thread):

def __init__(self,outDir,baseUrl,item,method,numPages,numRows,semaphore):
    threading.Thread.__init__(self)
    self.outDir = outDir
    self.baseUrl = baseUrl
    self.method = method
    self.numPages = numPages
    self.numRows = numRows
    self.item = item
    self.semaphore = semaphore

def run(self):
    with self.semaphore: # 'with' is awesome.
        with open( os.path.join(self.outDir,self.item+".xml"), 'a' ) as f:
            for i in xrange(1,self.numPages+1):
                url = self.baseUrl + \
                "method=" + self.method + \
                "&item=" + self.item + \
                "&page=" + str(i) + \
                "&rows=" + str(self.numRows) + \
                "&prettyXML"
                page = urllib2.urlopen(url)
                f.write(page.read())
                page.close() # Must remember to close!

The semaphore is a BoundedSemaphore to constrain the total number of running threads.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

ゃ懵逼小萝莉 2024-10-16 06:59:57

这不是 Python 问题,在 Linux 系统上请确保 nscd(名称服务缓存守护进程)实际正在运行。

更新:
看看你的代码,你永远不会调用 page.close() 因此泄漏了套接字。

This is not a Python problem, on Linux systems make sure nscd (Name Service Cache Daemon) is actually running.

UPDATE:
And looking at your code you are never calling page.close() hence leaking sockets.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文