如何在Python 3中处理urllib的超时?
首先,我的问题与这个非常相似。我希望 urllib.urlopen() 超时来生成我可以处理的异常。
这不属于 URLError 吗?
try:
response = urllib.request.urlopen(url, timeout=10).read().decode('utf-8')
except (HTTPError, URLError) as error:
logging.error(
'Data of %s not retrieved because %s\nURL: %s', name, error, url)
else:
logging.info('Access successful.')
错误信息:
resp = urllib.request.urlopen(req, timeout=10).read().decode('utf-8')
文件“/usr/lib/python3.2/urllib/request.py”,第 138 行,位于 urlopen
return opener.open(url, data, timeout)
文件“/usr/lib/python3.2/urllib/request.py”,第 369 行,打开
响应 = self._open(req, 数据)
文件“/usr/lib/python3.2/urllib/request.py”,第 387 行,位于 _open
'_open',要求)
文件“/usr/lib/python3.2/urllib/request.py”,第 347 行,位于 _call_chain
结果 = func(*args)
文件“/usr/lib/python3.2/urllib/request.py”,第 1156 行,位于 http_open
返回 self.do_open(http.client.HTTPConnection, req)
文件“/usr/lib/python3.2/urllib/request.py”,第 1141 行,在 do_open
r = h.getresponse()
文件“/usr/lib/python3.2/http/client.py”,第 1046 行,在 getresponse
响应.begin()
文件“/usr/lib/python3.2/http/client.py”,第 346 行,开始
版本、状态、原因 = self._read_status()
文件“/usr/lib/python3.2/http/client.py”,第 308 行,位于 _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
文件“/usr/lib/python3.2/socket.py”,第 276 行,位于 readinto
返回 self._sock.recv_into(b)
socket.timeout: 超时
Python 3 中发生了重大变化,他们将 urllib
和 urllib2
模块重新组织到 urllib
中。是否有可能是当时发生了变化导致了这种情况?
First off, my problem is quite similar to this one. I would like a timeout of urllib.urlopen() to generate an exception that I can handle.
Doesn't this fall under URLError?
try:
response = urllib.request.urlopen(url, timeout=10).read().decode('utf-8')
except (HTTPError, URLError) as error:
logging.error(
'Data of %s not retrieved because %s\nURL: %s', name, error, url)
else:
logging.info('Access successful.')
The error message:
resp = urllib.request.urlopen(req, timeout=10).read().decode('utf-8')
File "/usr/lib/python3.2/urllib/request.py", line 138, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.2/urllib/request.py", line 369, in open
response = self._open(req, data)
File "/usr/lib/python3.2/urllib/request.py", line 387, in _open
'_open', req)
File "/usr/lib/python3.2/urllib/request.py", line 347, in _call_chain
result = func(*args)
File "/usr/lib/python3.2/urllib/request.py", line 1156, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "/usr/lib/python3.2/urllib/request.py", line 1141, in do_open
r = h.getresponse()
File "/usr/lib/python3.2/http/client.py", line 1046, in getresponse
response.begin()
File "/usr/lib/python3.2/http/client.py", line 346, in begin
version, status, reason = self._read_status()
File "/usr/lib/python3.2/http/client.py", line 308, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/usr/lib/python3.2/socket.py", line 276, in readinto
return self._sock.recv_into(b)
socket.timeout: timed out
There was a major change from in Python 3 when they re-organised the urllib
and urllib2
modules into urllib
. Is it possible that there was a change then that causes this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
使用显式子句捕获不同的异常,并使用 URLError 检查异常的原因(谢谢 Régis B.< /a> 和 丹尼尔·安杰耶夫斯基)
注意对于最近的评论,原始帖子引用了 python 3.2,其中您需要使用
socket.timeout
显式捕获超时错误。例如Catch the different exceptions with explicit clauses, and check the reason for the exception with URLError (thank you Régis B. and Daniel Andrzejewski)
NB For recent comments, the original post referenced python 3.2 where you needed to catch timeout errors explicitly with
socket.timeout
. For example前面的答案没有正确拦截超时错误。超时错误以
URLError
形式引发,因此如果我们想专门捕获它们,我们需要编写:请注意,
ValueError
可以独立引发,即如果 URL 无效。与HTTPError
一样,它与超时无关。The previous answer does not correctly intercept timeout errors. Timeout errors are raised as
URLError
, so if we want to specifically catch them, we need to write:Note that
ValueError
can independently be raised, i.e. if the URL is invalid. LikeHTTPError
, it is not associated with a timeout.什么是“超时”?总的来说,我认为这意味着“服务器没有及时响应的情况,通常是由于高负载,值得重试。”
HTTP 状态 504“网关超时”将是此定义下的超时。它是通过 HTTPError 传递的。
根据该定义,HTTP 状态 429“请求过多”也属于超时。它也是通过 HTTPError 传递的。
否则,超时是什么意思?我们在通过 DNS 解析器解析域名时是否包含超时?尝试发送数据时超时?等待数据返回超时?
我不知道如何审核 urllib 的源代码,以确保我可能考虑超时的每一种可能的方式都以我能捕捉到的方式提出。在没有检查异常的语言中,我不知道如何。我有预感,连接到 dns 错误可能会以 socket.timeout 的形式返回,而连接到远程服务器的错误可能会以 URLError(socket.timeout) 的形式返回?这只是一个猜测,可以解释之前的观察结果。
所以我又回到了一些真正防御性的编码上。 (1) 我正在处理一些指示超时的 HTTP 状态代码。 (2) 有报告称,有些超时是通过 socket.timeout 异常来的,有些是通过 URLError(socket.timeout) 异常来的,所以我捕获了两者。 (3) 为了以防万一,我也加入了 HTTPError(socket.timeout) 。
What is a "timeout"? Holistically I think it means "a situation where the server didn't respond in time, typically because of high load, and it's worth retrying again."
HTTP status 504 "gateway timeout" would be a timeout under this definition. It's delivered via HTTPError.
HTTP status 429 "too many requests" would also be a timeout under that definition. It too is delivered via HTTPError.
Otherwise, what do we mean by a timeout? Do we include timeouts in resolving the domain name via the DNS resolver? timeouts when trying to send data? timeouts when waiting for the data to come back?
I don't know how to audit the source code of urllib to be sure that every possible way that I might consider a timeout, is being raised in a way that I'd catch. In a language without checked exceptions, I don't know how. I have a hunch that maybe connect-to-dns errors might be coming back as socket.timeout, and connect-to-remote-server errors might be coming back as URLError(socket.timeout)? It's just a guess that might explain earlier observations.
So I fell back to some really defensive coding. (1) I'm handling some HTTP status codes that are indicative of timeouts. (2) There are reports that some timeouts come via socket.timeout exceptions, and some via URLError(socket.timeout) exceptions, so I'm catching both. (3) And just in case, I threw in HTTPError(socket.timeout) as well.