Python、gevent、urllib2.urlopen.read()、下载加速器
我正在尝试为 Linux 构建一个下载加速器。我的程序使用 gevent、os 和 urllib2。我的程序接收一个 URL 并尝试同时下载该文件。我的所有代码都是有效的。我唯一的问题是 urllib2.urlopen.read() 阻止我同时运行 .read() 函数。
这是向我抛出的异常。
Traceback (most recent call last):
File "/usr/lib/pymodules/python2.7/gevent/greenlet.py", line 405, in run
result = self._run(*self.args, **self.kwargs)
File "gevent_concurrent_downloader.py", line 94, in childTasklet
_tempRead = handle.read(divisor) # Read/Download part
File "/usr/lib/python2.7/socket.py", line 380, in read
data = self._sock.recv(left)
File "/usr/lib/python2.7/httplib.py", line 561, in read
s = self.fp.read(amt)
File "/usr/lib/python2.7/socket.py", line 380, in read
data = self._sock.recv(left)
File "/usr/lib/pymodules/python2.7/gevent/socket.py", line 407, in recv
wait_read(sock.fileno(), timeout=self.timeout, event=self._read_event)
File "/usr/lib/pymodules/python2.7/gevent/socket.py", line 153, in wait_read
assert event.arg is None, 'This event is already used by another greenlet: %r' % (event.arg, )
AssertionError: This event is already used by another greenlet: (<Greenlet at 0x2304958: childTasklet(<__main__.NewFile object at 0x22c4390>, 4595517, <addinfourl at 37154616 whose fp = <socket._fileob, 459551, 1)>, timeout('timed out',))
<Greenlet at 0x2304ea8: childTasklet(<__main__.NewFile object at 0x22c4390>,4595517, <addinfourl at 37154616 whose fp = <socket._fileob, 7, -1)failed with AssertionError
我的程序的工作原理是通过调用以下方法从 URL 获取文件字节大小:
urllib2.urlopen(URL).info().get("Content-Length")
并将文件大小除以除数,从而将下载过程分成几部分。在此示例中,我将下载内容分为 10 个部分。
每个 greenlet 都以这种方式运行命令:
urllib2.urlopen(URL).read(offset)
这是我在 Pastie 上托管的代码的链接:http:// /pastie.org/3253705
感谢您的帮助!
仅供参考:我正在 Ubuntu 11.10 上运行。
I am attempting to build a download accelerator for Linux. My program utilizes gevent, os, and urllib2. My program receives a URL and attempts to download the file concurrently. All of my code is valid. My only problem is that urllib2.urlopen.read() is blocking me from running the .read() function concurrently.
This is the exception thats thrown at me.
Traceback (most recent call last):
File "/usr/lib/pymodules/python2.7/gevent/greenlet.py", line 405, in run
result = self._run(*self.args, **self.kwargs)
File "gevent_concurrent_downloader.py", line 94, in childTasklet
_tempRead = handle.read(divisor) # Read/Download part
File "/usr/lib/python2.7/socket.py", line 380, in read
data = self._sock.recv(left)
File "/usr/lib/python2.7/httplib.py", line 561, in read
s = self.fp.read(amt)
File "/usr/lib/python2.7/socket.py", line 380, in read
data = self._sock.recv(left)
File "/usr/lib/pymodules/python2.7/gevent/socket.py", line 407, in recv
wait_read(sock.fileno(), timeout=self.timeout, event=self._read_event)
File "/usr/lib/pymodules/python2.7/gevent/socket.py", line 153, in wait_read
assert event.arg is None, 'This event is already used by another greenlet: %r' % (event.arg, )
AssertionError: This event is already used by another greenlet: (<Greenlet at 0x2304958: childTasklet(<__main__.NewFile object at 0x22c4390>, 4595517, <addinfourl at 37154616 whose fp = <socket._fileob, 459551, 1)>, timeout('timed out',))
<Greenlet at 0x2304ea8: childTasklet(<__main__.NewFile object at 0x22c4390>,4595517, <addinfourl at 37154616 whose fp = <socket._fileob, 7, -1)failed with AssertionError
My program works by getting the file byte size from the URL by invoking:
urllib2.urlopen(URL).info().get("Content-Length")
and dividing the file size by a divisor and thus breaking the download process into parts. In this example i am breaking the download into 10 parts.
Each greenlet runs a command in this fassion:
urllib2.urlopen(URL).read(offset)
Here's a link to my code hosted on pastie: http://pastie.org/3253705
Thank you for the help!
FYI: I am running on Ubuntu 11.10.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您正在尝试读取来自不同 greenlet 的单个请求的响应。
如果您想使用多个并发连接下载同一文件,则可以使用
Range
http header 如果服务器支持的话(对于带有 Range 标头的请求,您会得到 206 状态而不是 200)。请参阅HTTPRangeHandler
。You're trying to read a response to a single request from different greenlets.
If you'd like to download the same file using several concurrent connections then you could use
Range
http header if the server supports it (you get 206 status instead of 200 for the request with Range header). SeeHTTPRangeHandler
.read 的参数是字节数,而不是偏移量。
看起来 gevent 可以让你异步调用 urllib,但不允许你从多个 greenlet 访问相同的资源。
此外,由于它使用 wait_read,因此效果仍然是从文件中同步、顺序读取(与您想要实现的完全相反)。
我建议您可能需要低于 urllib2,或使用与 urllib2 不同的库。
the argument to
read
is a number of bytes, not an offset.It seems gevent will let you call urllib asynchronously, but not let you access the same resource from multiple greenlets.
Furthermore, since it is using wait_read, the effect will still be a synchronous, sequential read from the file (The complete opposite of what you wanted to achieve).
I'd suggest you might need to go lower than, or use a different library from, urllib2.