在 Python/Mechanize 中从 ECONNRESET 恢复

发布于 2024-10-15 10:19:54 字数 893 浏览 2 评论 0原文

我有一个用 Python/Mechanize 编写的大量下载应用程序,旨在下载 20,000 个文件之类的内容。显然,任何这么大的下载器偶尔都会遇到一些 ECONNRESET 错误。现在,我知道如何分别处理每一个,但是有两个问题

  1. :我真的不想将每个出站 Web 调用包装在 try/catch 块中。
  2. 即使我这样做,一旦抛出异常,也不知道如何处理错误。如果代码只是

    data = browser.response().read()
    

    那么我确切地知道如何处理它,即:

    <前><代码>数据 = 无 而(数据==无): 尝试: 数据 = browser.response().read() 除了 IOError 为 e: 如果 e.args[1].args[0].errno != errno.ECONNRESET: 增加 数据 = 无

    但如果它只是一个随机实例

    browser.follow_link(链接)
    

    那么如果在此处某处抛出 ECONNRESET,我如何知道 Mechanize 的内部状态是什么样的?例如,在再次尝试该代码之前是否需要调用 browser.back() ?从此类错误中恢复的正确方法是什么?

编辑:已接受答案中的解决方案当然有效,就我而言,事实证明实施起来并不难。然而,我在学术上仍然感兴趣,是否有一种错误处理机制可以导致更快的错误捕获。

I've got a large bulk downloading application written in Python/Mechanize, aiming to download something like 20,000 files. Clearly, any downloader that big is occasionally going to run into some ECONNRESET errors. Now, I know how to handle each of these individually, but there's two problems with that:

  1. I'd really rather not wrap every single outbound web call in a try/catch block.
  2. Even if I were to do so, there's trouble with knowing how to handle the errors once the exception has thrown. If the code is just

    data = browser.response().read()
    

    then I know precisely how to deal with it, namely:

    data = None
    while (data == None):
        try:
            data = browser.response().read()
        except IOError as e:
            if e.args[1].args[0].errno != errno.ECONNRESET:
                raise
            data = None
    

    but if it's just a random instance of

    browser.follow_link(link)
    

    then how do I know what Mechanize's internal state looks like if an ECONNRESET is thrown somewhere in here? For example, do I need to call browser.back() before I try the code again? What's the proper way to recover from that kind of error?

EDIT: The solution in the accepted answer certainly works, and in my case it turned out to be not so hard to implement. I'm still academically interested, however, in whether there's an error handling mechanism that could result in quicker error catching.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

池木 2024-10-22 10:19:54

也许将 try.. except 块放在命令链中更高的位置:

import collections
def download_file(url):
    # Bundle together the bunch of browser calls necessary to download one file.
    browser.follow_link(...)
    ...
    response=browser.response()
    data=response.read()

urls=collections.deque(urls)

while urls:
    url=urls.popleft()
    try:
        download_file(url)
    except IOError as err:
        if err.args[1].args[0].errno != errno.ECONNRESET:
            raise
        else:
            # if ECONNRESET error, add the url back to urls to try again later
            urls.append(url)

Perhaps place the try..except block higher up in the chain of command:

import collections
def download_file(url):
    # Bundle together the bunch of browser calls necessary to download one file.
    browser.follow_link(...)
    ...
    response=browser.response()
    data=response.read()

urls=collections.deque(urls)

while urls:
    url=urls.popleft()
    try:
        download_file(url)
    except IOError as err:
        if err.args[1].args[0].errno != errno.ECONNRESET:
            raise
        else:
            # if ECONNRESET error, add the url back to urls to try again later
            urls.append(url)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文