处理 urllib2.URLError 时获取 URL

发布于 2024-11-17 16:34:38 字数 514 浏览 2 评论 0原文

这特别适用于 urllib2,但更普遍地适用于自定义异常处理。如何通过引发的异常将附加信息传递给另一个模块中的调用函数?我假设我会使用自定义异常类重新引发,但我不确定技术细节。

我不会用我尝试过但失败的内容来污染示例代码,而是将其简单地呈现为一块空白的石板。我的最终目标是让示例中的最后一行正常工作。

#mymod.py
import urllib2

def openurl():
    req = urllib2.Request("http://duznotexist.com/")
    response = urllib2.urlopen(req)

#main.py
import urllib2
import mymod

try:
    mymod.openurl()
except urllib2.URLError as e:
    #how do I do this?
    print "Website (%s) could not be reached due to %s" % (e.url, e.reason)

This pertains to urllib2 specifically, but custom exception handling more generally. How do I pass additional information to a calling function in another module via a raised exception? I'm assuming I would re-raise using a custom exception class, but I'm not sure of the technical details.

Rather than pollute the sample code with what I've tried and failed, I'll simply present it as a mostly blank slate. My end goal is for the last line in the sample to work.

#mymod.py
import urllib2

def openurl():
    req = urllib2.Request("http://duznotexist.com/")
    response = urllib2.urlopen(req)

#main.py
import urllib2
import mymod

try:
    mymod.openurl()
except urllib2.URLError as e:
    #how do I do this?
    print "Website (%s) could not be reached due to %s" % (e.url, e.reason)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

离鸿 2024-11-24 16:34:38

您可以添加信息,然后重新引发异常。

#mymod.py
import urllib2

def openurl():
    req = urllib2.Request("http://duznotexist.com/")
    try:
        response = urllib2.urlopen(req)
    except urllib2.URLError as e:
        # add URL and reason to the exception object
        e.url = "http://duznotexist.com/"
        e.reason = "URL does not exist"
        raise e # re-raise the exception, so the calling function can catch it

#main.py
import urllib2
import mymod

try:
    mymod.openurl()
except urllib2.URLError as e:
    print "Website (%s) could not be reached due to %s" % (e.url, e.reason)

You can add information to and then re-raise the exception.

#mymod.py
import urllib2

def openurl():
    req = urllib2.Request("http://duznotexist.com/")
    try:
        response = urllib2.urlopen(req)
    except urllib2.URLError as e:
        # add URL and reason to the exception object
        e.url = "http://duznotexist.com/"
        e.reason = "URL does not exist"
        raise e # re-raise the exception, so the calling function can catch it

#main.py
import urllib2
import mymod

try:
    mymod.openurl()
except urllib2.URLError as e:
    print "Website (%s) could not be reached due to %s" % (e.url, e.reason)
眉黛浅 2024-11-24 16:34:38

我认为重新引发异常不是解决此问题的适当方法。

正如@Jonathan Vanasco 所说,

如果您打开 a.com ,并且它 301 重定向到 b.com , urlopen 将自动遵循该重定向,因为引发了带有重定向的 HTTPError 。如果 b.com 导致 URLError ,则上面的代码将 a.com 标记为不存在

我的解决方案是覆盖 urllib2.HTTPRedirectHandlerredirect_request

import urllib2

class NewHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
    def redirect_request(self, req, fp, code, msg, headers, newurl):
        m = req.get_method()
        if (code in (301, 302, 303, 307) and m in ("GET", "HEAD")
            or code in (301, 302, 303) and m == "POST"):
            newurl = newurl.replace(' ', '%20')
            newheaders = dict((k,v) for k,v in req.headers.items()
                              if k.lower() not in ("content-length", "content-type")
                             )
            # reuse the req object
            # mind that req will be changed if redirection happends
            req.__init__(newurl,
                headers=newheaders,
                   origin_req_host=req.get_origin_req_host(),
                   unverifiable=True)
            return req
        else:
            raise HTTPError(req.get_full_url(), code, msg, headers, fp)

opener = urllib2.build_opener(NewHTTPRedirectHandler)
urllib2.install_opener(opener)
# mind that req will be changed if redirection happends
#req = urllib2.Request('http://127.0.0.1:5000')
req = urllib2.Request('http://www.google.com/')

try:
    response = urllib2.urlopen(req)
except urllib2.URLError as e:
    print 'error'
    print req.get_full_url()
else:
    print 'normal'
    print response.geturl()

让我们尝试将 url 重定向到未知网址:

import os
from flask import Flask,redirect

app = Flask(__name__)

@app.route('/')
def hello():
    # return 'hello world'
    return redirect("http://a.com", code=302)

    if __name__ == '__main__':
    port = int(os.environ.get('PORT', 5000))
    app.run(host='0.0.0.0', port=port)

结果是:

error
http://a.com/

normal
http://www.google.com/

I don't think re-raising the exception is an appropriate way to solve this problem.

As @Jonathan Vanasco said,

if you're opening a.com , and it 301 redirects to b.com , urlopen will automatically follow that because an HTTPError with a redirect was raised. if b.com causes the URLError , the code above marks a.com as not existing

My solution is to overwrite redirect_request of urllib2.HTTPRedirectHandler

import urllib2

class NewHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
    def redirect_request(self, req, fp, code, msg, headers, newurl):
        m = req.get_method()
        if (code in (301, 302, 303, 307) and m in ("GET", "HEAD")
            or code in (301, 302, 303) and m == "POST"):
            newurl = newurl.replace(' ', '%20')
            newheaders = dict((k,v) for k,v in req.headers.items()
                              if k.lower() not in ("content-length", "content-type")
                             )
            # reuse the req object
            # mind that req will be changed if redirection happends
            req.__init__(newurl,
                headers=newheaders,
                   origin_req_host=req.get_origin_req_host(),
                   unverifiable=True)
            return req
        else:
            raise HTTPError(req.get_full_url(), code, msg, headers, fp)

opener = urllib2.build_opener(NewHTTPRedirectHandler)
urllib2.install_opener(opener)
# mind that req will be changed if redirection happends
#req = urllib2.Request('http://127.0.0.1:5000')
req = urllib2.Request('http://www.google.com/')

try:
    response = urllib2.urlopen(req)
except urllib2.URLError as e:
    print 'error'
    print req.get_full_url()
else:
    print 'normal'
    print response.geturl()

let's try to redirect the url to an unknown url:

import os
from flask import Flask,redirect

app = Flask(__name__)

@app.route('/')
def hello():
    # return 'hello world'
    return redirect("http://a.com", code=302)

    if __name__ == '__main__':
    port = int(os.environ.get('PORT', 5000))
    app.run(host='0.0.0.0', port=port)

And the result is:

error
http://a.com/

normal
http://www.google.com/
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文