urllib2 超时但不关闭套接字连接

发布于 2024-08-18 19:00:38 字数 351 浏览 1 评论 0原文

我正在制作一个 python URL 抓取程序。出于我的目的,我希望它能够非常快地超时,所以我正在执行

urllib2.urlopen("http://.../", timeout=2)

当然它超时了正确地,因为它应该。然而,它并不费心关闭与服务器的连接,因此服务器认为客户端仍然处于连接状态。如何让 urllib2 在超时后关闭连接?

运行 gc.collect() 不起作用,如果我无法控制,我不想使用 httplib 。

我能得到的最接近的是:第一次尝试将超时。服务器报告连接刚刚关闭,因为第二次尝试超时。然后,服务器报告连接在第三次尝试超时时刚刚关闭。无穷无尽。

非常感谢。

I'm making a python URL grabber program. For my purposes, I want it to time out really really fast, so I'm doing

urllib2.urlopen("http://.../", timeout=2)

Of course it times out correctly as it should. However, it doesn't bother to close the connection to the server, so the server thinks the client is still connected. How can I ask urllib2 to just close the connection after it times out?

Running gc.collect() doesn't work and I'd like to not use httplib if I can't help it.

The closest I can get is: the first try will time out. The server reports that the connection closed just as the second try times out. Then, the server reports the connection closed just as the third try times out. Ad infinitum.

Many thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

星星的轨迹 2024-08-25 19:00:38

我怀疑套接字在堆栈帧中仍然打开。当 Python 引发异常时,它会存储堆栈帧,以便调试器和其他工具可以查看堆栈和内省值。

由于历史原因,现在为了向后兼容,堆栈信息(基于每个线程)存储在 sys 中(请参阅 sys.exc_info()、sys.exc_type 等)。这是 Python 3.0 中已删除的内容之一。

这对您来说意味着堆栈仍然存在并且被引用。该堆栈包含某些具有打开套接字的函数的本地数据。这就是套接字尚未关闭的原因。只有当堆栈跟踪被删除时,所有内容才会被GC。

要测试情况是否如此,请

try:
  1/0
except ZeroDivisionError:
  pass

在 except 子句中插入类似的内容。这是用其他内容替换当前异常的快速方法。

I have a suspicion that the socket is still open in the stack frames. When Python raises an exception it stores the stack frames so debuggers and other tools can view the stack and introspect values.

For historical reasons, and now for backwards compatibility, the stack information is stored (on a per-thread basis) in sys (see sys.exc_info(), sys.exc_type and others). This is one of the things which has been removed in Python 3.0.

What that means for you is the stack is still alive, and referenced. There stack contains the local data for some function which has the open socket. That's why the socket isn't yet closed. It's only when the stack trace is removed that everything will be gc'ed.

To test if that's the case, insert something like

try:
  1/0
except ZeroDivisionError:
  pass

in your except clause. That's a quick way to replace the current exception with something else.

残疾 2024-08-25 19:00:38

这是一个黑客,但下面的代码可以工作。如果请求是在另一个函数中并且它没有引发异常,则套接字始终关闭。

def _fetch(self, url):
    try:
        return urllib2.urlopen(urllib2.Request(url), timeout=5).read()
    except urllib2.URLError, e:
        if isinstance(e.reason, socket.timeout):
            return None
        else:
            raise e

def fetch(self, url):
    x = None
    while x is None:
        x = self._fetch(url)
        print "Timeout"
    return x

有谁有更好的方法吗?

This is SUCH a hack, but the following code works. If the request is in another function AND it does not raise an exception, then the socket is always closed.

def _fetch(self, url):
    try:
        return urllib2.urlopen(urllib2.Request(url), timeout=5).read()
    except urllib2.URLError, e:
        if isinstance(e.reason, socket.timeout):
            return None
        else:
            raise e

def fetch(self, url):
    x = None
    while x is None:
        x = self._fetch(url)
        print "Timeout"
    return x

Does ANYONE have a better way?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文