便秘的Python urllib2套接字
我一直在互联网上寻找 Python 问题的解决方案。我正在尝试使用 urllib2 连接从 HTTP 服务器读取可能无穷无尽的数据流。它是某些交互式通信的一部分,因此即使缓冲区未满,我也能获取可用的数据,这一点很重要。似乎没有办法让 read
\ readline
返回可用数据。它将永远阻塞,等待整个(无限)流返回。
即使我使用 fnctl 将底层文件描述符设置为非阻塞,urllib2 文件对象仍然会阻塞!! 一般来说,似乎没有办法在 上创建 python 文件对象read
,如果有的话返回所有可用数据,否则阻塞。
我看过一些关于人们寻求帮助的帖子,但我没有看到任何解决方案。什么给?我错过了什么吗?这似乎是一个完全毁掉的正常用例!我希望利用 urllib2 的能力来检测配置的代理并使用分块编码,但如果它不合作我就不能。
编辑:根据请求,这里是一些示例代码
客户端:
connection = urllib2.urlopen(commandpath)
id = connection.readline()
现在假设服务器正在使用分块传输编码,并将一个块写入流中,并且该块包含该行,然后等待。连接仍处于打开状态,但客户端有数据在缓冲区中等待。
我无法让 read
或 readline
返回我知道它正在等待的数据,因为它尝试读取直到连接结束。在这种情况下,连接可能永远不会关闭,因此它将永远等待,或者直到发生不活动超时,从而断开连接。一旦连接被切断它就会返回,但这显然不是我想要的行为。
I've been scouring the Internet looking for a solution to my problem with Python. I'm trying to use a urllib2 connection to read a potentially endless stream of data from an HTTP server. It's part of some interactive communication, so it's important that I can get the data that's available, even if it's not a whole buffer full. There seems to be no way to have read
\ readline
return the available data. It will block forever waiting for the entire (endless) stream before it returns.
Even if I set the underlying file descriptor to non-blocking using fnctl, the urllib2 file-object still blocks!! In general there seems to be no way to make python file-objects, upon read
, return all available data if there is some and block otherwise.
I've seen a few posts about people seeking help with this, but I have seen no solutions. What gives? Am I missing something? This seems like such a normal use-case to completely ruin! I'm hoping to utilize urllib2's ability to detect configured proxies and use chunked encoding, but I can't if it won't cooperate.
Edit: Upon request, here is some example code
Client:
connection = urllib2.urlopen(commandpath)
id = connection.readline()
Now suppose that the server is using chunked transfer encoding, and writes one chunk down the stream and the chunk contains the line, and then waits. The connection is still open, but the client has data waiting in a buffer.
I cannot get read
or readline
to return the data I know it has waiting for it, because it tries to read until the end of the connection. In this case the connection may never close so it will wait either forever or until an inactivity timeout occurs, severing the connection. Once the connection is severed it will return, but that's obviously not the behavior I want.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
urllib2
在 HTTP 级别运行,适用于完整的文档。我认为,如果不侵入 urllib2 源代码,就没有办法解决这个问题。您可以做的是使用普通套接字(在这种情况下您必须自己与 HTTP 通信),并调用
sock.recv(maxbytes)
,它只读取可用数据。更新:您可能需要尝试调用
conn.fp._sock.recv(maxbytes)
,而不是conn.read(bytes)
一个 urllib2 连接。urllib2
operates at the HTTP level, which works with complete documents. I don't think there's a way around that without hacking into theurllib2
source code.What you can do is use plain sockets (you'll have to talk HTTP yourself in this case), and call
sock.recv(maxbytes)
which does read only available data.Update: you may want to try to call
conn.fp._sock.recv(maxbytes)
, instead ofconn.read(bytes)
on anurllib2
connection.