为什么 Python 的 urllib2.urlopen() 会针对成功的状态代码引发 HTTPError ?

发布于 2024-11-29 07:26:16 字数 1064 浏览 1 评论 0原文

根据 urllib2 文档

由于默认处理程序处理重定向(300 范围内的代码),而 100-299 范围内的代码表示成功,因此您通常只会看到 400-599 范围内的错误代码。

然而,下面的代码

request = urllib2.Request(url, data, headers)
response = urllib2.urlopen(request)

会引发一个 HTTPError,代码为 201(已创建):

ERROR    2011-08-11 20:40:17,318 __init__.py:463] HTTP Error 201: Created

那么为什么 urllib2 在这个成功的请求上抛出 HTTPErrors?

我可以轻松地将代码扩展为:

try:
    request = urllib2.Request(url, data, headers)
    response = urllib2.urlopen(request)
except HTTPError, e:
    if e.code == 201:
        # success! :)
    else:
        # fail! :(
else:
    # when will this happen...?

但这似乎不是预期的行为,基于文档以及我找不到关于这种奇怪行为的类似问题的事实。

此外,else 块应该期待什么? 如果成功的状态代码都被解释为 HTTPError,那么 urllib2. urlopen() 只是返回一个普通的类似文件的响应对象,就像所有 urllib2 文档所引用的那样?

According to the urllib2 documentation,

Because the default handlers handle redirects (codes in the 300 range), and codes in the 100-299 range indicate success, you will usually only see error codes in the 400-599 range.

And yet the following code

request = urllib2.Request(url, data, headers)
response = urllib2.urlopen(request)

raises an HTTPError with code 201 (created):

ERROR    2011-08-11 20:40:17,318 __init__.py:463] HTTP Error 201: Created

So why is urllib2 throwing HTTPErrors on this successful request?

It's not too much of a pain; I can easily extend the code to:

try:
    request = urllib2.Request(url, data, headers)
    response = urllib2.urlopen(request)
except HTTPError, e:
    if e.code == 201:
        # success! :)
    else:
        # fail! :(
else:
    # when will this happen...?

But this doesn't seem like the intended behavior, based on the documentation and the fact that I can't find similar questions about this odd behavior.

Also, what should the else block be expecting? If successful status codes are all interpreted as HTTPErrors, then when does urllib2.urlopen() just return a normal file-like response object like all the urllib2 documentation refers to?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

命硬 2024-12-06 07:26:16

您可以编写一个自定义 Handler 类以与 urllib2 一起使用,以防止将特定错误代码引发为 HTTError。这是我以前用过的一个:

class BetterHTTPErrorProcessor(urllib2.BaseHandler):
    # a substitute/supplement to urllib2.HTTPErrorProcessor
    # that doesn't raise exceptions on status codes 201,204,206
    def http_error_201(self, request, response, code, msg, hdrs):
        return response
    def http_error_204(self, request, response, code, msg, hdrs):
        return response
    def http_error_206(self, request, response, code, msg, hdrs):
        return response

然后你可以像这样使用它:

opener = urllib2.build_opener(self.BetterHTTPErrorProcessor)
urllib2.install_opener(opener)

req = urllib2.Request(url, data, headers)
urllib2.urlopen(req)

You can write a custom Handler class for use with urllib2 to prevent specific error codes from being raised as HTTError. Here's one I've used before:

class BetterHTTPErrorProcessor(urllib2.BaseHandler):
    # a substitute/supplement to urllib2.HTTPErrorProcessor
    # that doesn't raise exceptions on status codes 201,204,206
    def http_error_201(self, request, response, code, msg, hdrs):
        return response
    def http_error_204(self, request, response, code, msg, hdrs):
        return response
    def http_error_206(self, request, response, code, msg, hdrs):
        return response

Then you can use it like:

opener = urllib2.build_opener(self.BetterHTTPErrorProcessor)
urllib2.install_opener(opener)

req = urllib2.Request(url, data, headers)
urllib2.urlopen(req)
潜移默化 2024-12-06 07:26:16

正如实际的库文档提到的:

对于 200 个错误代码,立即返回响应对象。

对于非 200 错误代码,这只是通过 OpenerDirector.error() 将作业传递给 protocol_error_code 处理程序方法。最终,如果没有其他处理程序处理错误,urllib2.HTTPDefaultErrorHandler 将引发 HTTPError。

http://docs.python.org/library/urllib2.html#httperrorprocessor-objects

As the actual library documentation mentions:

For 200 error codes, the response object is returned immediately.

For non-200 error codes, this simply passes the job on to the protocol_error_code handler methods, via OpenerDirector.error(). Eventually, urllib2.HTTPDefaultErrorHandler will raise an HTTPError if no other handler handles the error.

http://docs.python.org/library/urllib2.html#httperrorprocessor-objects

烟花易冷人易散 2024-12-06 07:26:16

我个人认为这是一个错误,并且将其作为默认行为非常不直观。
确实,非 2XX 代码意味着协议级错误,但将其转变为异常就太过分了(至少在我看来)。

无论如何,我认为避免这种情况的最优雅的方法是:

opener = urllib.request.build_opener()
for processor in opener.process_response['https']: # or http, depending on what you're using
   if isinstance(processor, urllib.request.HTTPErrorProcessor): # HTTPErrorProcessor also for https
       opener.process_response['https'].remove(processor)
       break # there's only one such handler by default
response = opener.open('https://www.google.com')

现在您有了响应对象。您可以检查它的状态代码、标头、正文等。

I personally think it was a mistake and very nonintuitive for this to be the default behavior.
It's true that non-2XX codes imply a protocol level error, but turning that into an exception is too far (in my opinion at least).

In any case, I think the most elegant way to avoid this is:

opener = urllib.request.build_opener()
for processor in opener.process_response['https']: # or http, depending on what you're using
   if isinstance(processor, urllib.request.HTTPErrorProcessor): # HTTPErrorProcessor also for https
       opener.process_response['https'].remove(processor)
       break # there's only one such handler by default
response = opener.open('https://www.google.com')

Now you have the response object. You can check it's status code, headers, body, etc.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文