关闭 urllib2 连接

发布于 2024-10-26 11:21:34 字数 2085 浏览 2 评论 0原文

我正在使用 urllib2 从 ftp 和 http 服务器加载文件。

某些服务器仅支持每个 IP 一个连接。问题是 urllib2 不会立即关闭连接。查看示例程序。

from urllib2 import urlopen
from time import sleep

url = 'ftp://user:pass@host/big_file.ext'

def load_file(url):
    f = urlopen(url)
    loaded = 0
    while True:
        data = f.read(1024)
        if data == '':
            break
        loaded += len(data)
    f.close()
    #sleep(1)
    print('loaded {0}'.format(loaded))

load_file(url)
load_file(url)

该代码从仅支持 1 个连接的 ftp 服务器加载两个文件(此处两个文件相同)。这将打印以下日志:

loaded 463675266
Traceback (most recent call last):
  File "conection_test.py", line 20, in <module>
    load_file(url)
  File "conection_test.py", line 7, in load_file
    f = urlopen(url)
  File "/usr/lib/python2.6/urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.6/urllib2.py", line 391, in open
    response = self._open(req, data)
  File "/usr/lib/python2.6/urllib2.py", line 409, in _open
    '_open', req)
  File "/usr/lib/python2.6/urllib2.py", line 369, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.6/urllib2.py", line 1331, in ftp_open
    fw = self.connect_ftp(user, passwd, host, port, dirs, req.timeout)
  File "/usr/lib/python2.6/urllib2.py", line 1352, in connect_ftp
    fw = ftpwrapper(user, passwd, host, port, dirs, timeout)
  File "/usr/lib/python2.6/urllib.py", line 854, in __init__
    self.init()
  File "/usr/lib/python2.6/urllib.py", line 860, in init
    self.ftp.connect(self.host, self.port, self.timeout)
  File "/usr/lib/python2.6/ftplib.py", line 134, in connect
    self.welcome = self.getresp()
  File "/usr/lib/python2.6/ftplib.py", line 216, in getresp
    raise error_temp, resp
urllib2.URLError: <urlopen error ftp error: 421 There are too many connections from your internet address.>

因此第一个文件已加载,第二个文件失败,因为第一个连接未关闭。

但是当我在 f.close() 之后使用 sleep(1) 时,不会发生错误:

loaded 463675266
loaded 463675266

有没有办法强制关闭连接,以便第二次下载不会失败?

I'm using urllib2 to load files from ftp- and http-servers.

Some of the servers support only one connection per IP. The problem is, that urllib2 does not close the connection instantly. Look at the example-program.

from urllib2 import urlopen
from time import sleep

url = 'ftp://user:pass@host/big_file.ext'

def load_file(url):
    f = urlopen(url)
    loaded = 0
    while True:
        data = f.read(1024)
        if data == '':
            break
        loaded += len(data)
    f.close()
    #sleep(1)
    print('loaded {0}'.format(loaded))

load_file(url)
load_file(url)

The code loads two files (here the two files are the same) from an ftp-server which supports only 1 connection. This will print the following log:

loaded 463675266
Traceback (most recent call last):
  File "conection_test.py", line 20, in <module>
    load_file(url)
  File "conection_test.py", line 7, in load_file
    f = urlopen(url)
  File "/usr/lib/python2.6/urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.6/urllib2.py", line 391, in open
    response = self._open(req, data)
  File "/usr/lib/python2.6/urllib2.py", line 409, in _open
    '_open', req)
  File "/usr/lib/python2.6/urllib2.py", line 369, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.6/urllib2.py", line 1331, in ftp_open
    fw = self.connect_ftp(user, passwd, host, port, dirs, req.timeout)
  File "/usr/lib/python2.6/urllib2.py", line 1352, in connect_ftp
    fw = ftpwrapper(user, passwd, host, port, dirs, timeout)
  File "/usr/lib/python2.6/urllib.py", line 854, in __init__
    self.init()
  File "/usr/lib/python2.6/urllib.py", line 860, in init
    self.ftp.connect(self.host, self.port, self.timeout)
  File "/usr/lib/python2.6/ftplib.py", line 134, in connect
    self.welcome = self.getresp()
  File "/usr/lib/python2.6/ftplib.py", line 216, in getresp
    raise error_temp, resp
urllib2.URLError: <urlopen error ftp error: 421 There are too many connections from your internet address.>

So the first file is loaded and the second fails because the first connection was not closed.

But when i use sleep(1) after f.close() the error does not occurr:

loaded 463675266
loaded 463675266

Is there any way to force close the connection so that the second download would not fail?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

过潦 2024-11-02 11:21:34

原因确实是文件描述符泄漏。我们还发现,使用 jython 时,问题比使用 cpython 时要明显得多。
一位同事提出了这个解决方案:

 

    fdurl = urllib2.urlopen(req,timeout=self.timeout)
    realsock = fdurl.fp._sock.fp._sock** # we want to close the "real" socket later 
    req = urllib2.Request(url, header)
    try:
             fdurl = urllib2.urlopen(req,timeout=self.timeout)
    except urllib2.URLError,e:
              print "urlopen exception", e
    realsock.close() 
    fdurl.close()

修复方法很丑陋,但确实有效,不再有“太多开放连接”。

The cause is indeed a file descriptor leak. We found also that with jython, the problem is much more obvious than with cpython.
A colleague proposed this sollution:

 

    fdurl = urllib2.urlopen(req,timeout=self.timeout)
    realsock = fdurl.fp._sock.fp._sock** # we want to close the "real" socket later 
    req = urllib2.Request(url, header)
    try:
             fdurl = urllib2.urlopen(req,timeout=self.timeout)
    except urllib2.URLError,e:
              print "urlopen exception", e
    realsock.close() 
    fdurl.close()

The fix is ugly, but does the job, no more "too many open connections".

身边 2024-11-02 11:21:34

Biggie:我认为这是因为连接没有shutdown()。

注意close()释放资源
与连接相关联,但确实
不一定要关闭连接
立即地。如果您想关闭
及时连接,拨打电话
shutdown() 在 close() 之前。

您可以在 f.close() 之前尝试类似的操作:(

import socket
f.fp._sock.fp._sock.shutdown(socket.SHUT_RDWR)

是的..如果可行,那就不对(tm),但您会知道问题是什么。)

Biggie: I think it's because the connection is not shutdown().

Note close() releases the resource
associated with a connection but does
not necessarily close the connection
immediately. If you want to close the
connection in a timely fashion, call
shutdown() before close().

You could try something like this before f.close():

import socket
f.fp._sock.fp._sock.shutdown(socket.SHUT_RDWR)

(And yes.. if that works, it's not Right(tm), but you'll know what the problem is.)

蝶…霜飞 2024-11-02 11:21:34

至于Python 2.7.1 urllib2确实泄漏了一个文件描述符:
https://bugs.pypy.org/issue867

as for Python 2.7.1 urllib2 indeed leaks a file descriptor:
https://bugs.pypy.org/issue867

霓裳挽歌倾城醉 2024-11-02 11:21:34

亚历克斯·马尔泰利回答了类似的问题。阅读此内容: 我应该在 urllib.urlopen( 之后调用 close() 吗)?

简而言之:

import contextlib

with contextlib.closing(urllib.urlopen(u)) as x:
    # ...

Alex Martelli answers to the similar question. Read this : should I call close() after urllib.urlopen()?

In a nutshell:

import contextlib

with contextlib.closing(urllib.urlopen(u)) as x:
    # ...
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文