如果超时则跳过 URL

发布于 2024-12-14 08:25:22 字数 310 浏览 1 评论 0 原文

我有一个 URL 列表,

我正在使用以下内容来检索其内容:

for url in url_list:
    req = urllib2.Request(url)
    resp = urllib2.urlopen(req, timeout=5)
    resp_page = resp.read()
    print resp_page

当超时时,程序就会崩溃。我只想读取下一个 URL(如果有 socket.timeout: timed out)。如何做到这一点?

谢谢

I have a list of URL's

I am using the following to retrieve their contents:

for url in url_list:
    req = urllib2.Request(url)
    resp = urllib2.urlopen(req, timeout=5)
    resp_page = resp.read()
    print resp_page

When there is a timeout, the program just crashes. I just want to read the next URL if there is a socket.timeout: timed out. How to do this?

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

聽兲甴掵 2024-12-21 08:25:22

尽管已经有了答案,但我想指出 URLlib2 可能不是造成此行为的唯一原因。

正如此处所指出的(而且它似乎也基于问题描述),异常可能属于socket库。

在这种情况下,只需添加另一个 except

import socket

try:
    resp = urllib2.urlopen(req, timeout=5)
except urllib2.URLError:
    print "Bad URL or timeout"
except socket.timeout:
    print "socket timeout"

Although there already is an answer, I'd like to point out that URLlib2 might not be the sole responsible with this behavior.

As pointed out here (and as it also seems based on the problem description), the exception may belong to the socket library.

In that case just add another except:

import socket

try:
    resp = urllib2.urlopen(req, timeout=5)
except urllib2.URLError:
    print "Bad URL or timeout"
except socket.timeout:
    print "socket timeout"
萌能量女王 2024-12-21 08:25:22

我将继续假设“崩溃”是指“引发 URLError”,如 urllib2.urlopen 文档。请参阅 Python 教程的错误和异常部分。

for url in url_list:
    req = urllib2.Request(url)
    try:
        resp = urllib2.urlopen(req, timeout=5)
    except urllib2.URLError:
        print "Bad URL or timeout"
        continue # skips to the next iteration of the loop
    resp_page = resp.read()
    print resp_page

I'm going to go ahead and assume that by "crashes" you mean "raises a URLError", as described by the urllib2.urlopen docs. See the Errors and Exceptions section of the Python Tutorial.

for url in url_list:
    req = urllib2.Request(url)
    try:
        resp = urllib2.urlopen(req, timeout=5)
    except urllib2.URLError:
        print "Bad URL or timeout"
        continue # skips to the next iteration of the loop
    resp_page = resp.read()
    print resp_page
意犹 2024-12-21 08:25:22

听起来你只需要捕获超时异常。我没有收到您收到的 socket.timeout 消息。

req = urllib2.Request("http://127.0.0.2")
try:
    resp = urllib2.urlopen(req, timeout=5)
except urllib2.URLError:
    print "Timeout!"

显然,您需要有一个实际上会超时的 URL(127.0.0.2 可能不在您的盒子上)。

Sounds like you just need to catch the timeout exception. I don't get a socket.timeout message that you do.

req = urllib2.Request("http://127.0.0.2")
try:
    resp = urllib2.urlopen(req, timeout=5)
except urllib2.URLError:
    print "Timeout!"

Obviously, you need to have a URL that will actually timeout (127.0.0.2 may not on your box).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文