如何在Python中使用urllib2使用urlopen关闭超时的http POST?
概述
我正在使用 Python 2.7.1 urllib2 包中的 urlopen 从 Windows XP 计算机到远程 Apache Web 服务器(例如 Mac OS X 的内置 Web 共享)执行 HTTP POST。发送的数据包含一些标识符、数据和校验和,如果发送了所有数据,服务器会以确认进行响应。数据中的校验和可用于检查所有内容是否按正确顺序到达。
问题
通常这工作得很好,但有时互联网连接很差,通常是因为发送数据的客户端使用 WiFi 或 3G 连接。这会导致互联网连接在任意时间内丢失。 urlopen 包含一个超时选项,以确保这不会阻止您的程序并且它可以继续。
这就是我想要的,但问题是 urlopen 不会阻止套接字继续发送超时发生时仍必须发送的任何数据。我已经通过尝试向我的笔记本电脑发送大量数据来对此进行了测试(使用下面将显示的代码),我会在两个显示活动上看到网络活动,然后我会停止笔记本电脑上的无线,等待直到函数超时,然后重新激活无线,数据传输将继续,但程序将不再监听响应。我什至尝试退出Python解释器,它仍然会发送数据,因此控制权以某种方式移交给了Windows。
原因
超时(据我理解)的工作原理如下: 它检查“空闲响应时间”
( [Python-Dev] 向 urllib2 添加套接字超时 )
如果将超时设置为3,它将打开连接,启动计数器,然后尝试发送数据并等待响应,如果在接收响应之前的任何时刻计时器用完,则会调用超时异常。请注意,就超时计时器而言,数据的发送似乎不算作“活动”。
( urllib2 超时但不关闭套接字连接 )
( 关闭 urllib2 连接 )
显然,某处指出,当套接字关闭/取消引用/垃圾收集时,它会被收集调用其“close”函数,该函数在关闭套接字之前等待所有数据发送。然而,还有一个关闭函数,它应该立即停止套接字,防止发送更多数据。
( socket.shutdown 与 socket.close )
( http://docs.python.org/library/socket.html #socket.socket.close )
我想要什么
我希望连接在发生超时时“关闭”。否则我的客户端将无法判断数据是否已正确接收,并且可能会尝试再次发送。我宁愿终止连接并稍后重试,因为我知道数据(可能)未成功发送(如果校验和不匹配,服务器可以识别这一点)。
这是我用来测试这个的代码的一部分。 try.. except 部分尚未按我的预期工作,也感谢那里的任何帮助。正如我之前所说,我希望程序在引发超时(或任何其他)异常时立即关闭套接字。
from urllib import urlencode
from urllib2 import urlopen, HTTPError, URLError
import socket
import sys
class Uploader:
def __init__(self):
self.URL = "http://.../"
self.data = urlencode({'fakerange':range(0,2000000,1)})
print "Data Generated"
def upload(self):
try:
f = urlopen(self.URL, self.data, timeout=10)
returncode = f.read()
except (URLError, HTTPError), msg:
returncode = str(msg)
except socket.error:
returncode = "Socket Timeout!"
else:
returncode = 'Im here'
def main():
upobj = Uploader()
returncode = upobj.upload()
if returncode == '100':
print "Success!"
else:
print "Maybe a Fail"
print returncode
print "The End"
if __name__ == '__main__':
main()
Overview
I am using urlopen from the Python 2.7.1 urllib2 package to do a HTTP POST form a Windows XP machine to a remote Apache webserver (for instance the built-in web sharing of Mac OS X). The sent data contains some identifier, data and a checksum, if all data is sent the server responds with an acknowledgement. The checksum in the data can be used to check if everything arrived in fine order.
The Problem
Usually this works great, however sometimes the internet connection is bad, often because the client sending the data uses a wifi or 3G connection. This results in internet connection loss for some arbitrary amount time. urlopen contains a timeout option, to make sure that this does not block your program and it can continue.
This is what I want, but the problem is that urlopen does not stop the socket from continuing to send whatever data it still had to send when the timeout occurred. I have tested this (with the code that I will show below) by trying to send a large bit of data to my laptop, I would see the network activity on both show activity, I'd then stop the wireless on the laptop, wait until the function times out, and then reactivate the wireless, and the data transfer would then continue, but the program will not be listening for responses anymore. I even tried to exit the Python interpreter and it would still send data, so control of that is handed over to Windows somehow.
Causes
The timeout (as I understand it) works like this:
It checks for an 'idle response time'
( [Python-Dev] Adding socket timeout to urllib2 )
If you set the timeout to 3, it will open the connection, start a counter, then try to send the data and wait for a response, if at any point before receiving the response the timer runs out a timeout exception is called. Note that the sending of the data does not seem to count as 'activity' a far as the timeout timer is concerned.
( urllib2 times out but doesn't close socket connection )
( Close urllib2 connection )
Apparently it is somewhere stated that when a socket is closed/dereferenced/garbage collected it calls its 'close' function which waits for all data to be sent before closing the socket. However there is also a shutdown function, which should stop the socket immediately, preventing any more data to be sent.
( socket.shutdown vs socket.close )
( http://docs.python.org/library/socket.html#socket.socket.close )
What I Want
I want the connection to be 'shutdown' when a timeout occurs. Otherwise my client will not be able to tell if the data was received properly or not and it might try to send it again. I'd rather just kill the connection and try again later, knowing that the data was (probably) not send successfully (the server can recognize this if the checksum does not match).
Here is part of the code that I used to test this. The try..except parts do not yet work as I'd expect, any help there is also appreciated. As I said before I want the program to shutdown the socket as soon as the timeout (or any other) exception is raised.
from urllib import urlencode
from urllib2 import urlopen, HTTPError, URLError
import socket
import sys
class Uploader:
def __init__(self):
self.URL = "http://.../"
self.data = urlencode({'fakerange':range(0,2000000,1)})
print "Data Generated"
def upload(self):
try:
f = urlopen(self.URL, self.data, timeout=10)
returncode = f.read()
except (URLError, HTTPError), msg:
returncode = str(msg)
except socket.error:
returncode = "Socket Timeout!"
else:
returncode = 'Im here'
def main():
upobj = Uploader()
returncode = upobj.upload()
if returncode == '100':
print "Success!"
else:
print "Maybe a Fail"
print returncode
print "The End"
if __name__ == '__main__':
main()
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我发现一些代码可能会帮助您在此线程:
I found some code that might help you on this thread:
您可能会考虑使用与 urllib2 不同的 API。 httplib 有点不太令人愉快,但通常也不算太糟糕。但是,它确实使您可以访问底层套接字对象。所以,你可以这样做:(
免责声明:未经测试)
与 urllib2 相比,httplib 确实有各种限制 - 例如,它不会自动处理重定向之类的事情。但是,如果您使用它来访问相对固定的 API,而不是从互联网上随机下载内容,那么它应该可以很好地完成工作。
老实说,我自己可能不会费心去做这件事;我通常满足于让操作系统按照它想要的方式处理 TCP 缓冲区,即使它的方法并不总是完全最佳的......
You might consider using a different API than urllib2. httplib is a bit less pleasant, but often not too bad. It does, however, make it possible for you to access the underlying socket object. So, you could do something like:
(Disclaimer: untested)
httplib does have various limitations when compared to urllib2 - it won't automatically handle things like redirects, for example. However, if you're using this to access a relatively fixed API rather than download random things from the internet, it should do the job fine.
Honestly, I probably wouldn't bother to do this myself though; I'm generally content to let the operating system deal with TCP buffers however it wants, even if its approach isn't always completely optimal...
您可以使用
multiprocessing
生成辅助线程,然后在检测到超时时将其关闭(URLError
异常,消息“urlopen error timed out”)。停止进程应该足以关闭套接字。
You could spawn a secondary thread using
multiprocessing
, then shut it down whenever you detect a timeout (URLError
exception with message "urlopen error timed out").Stopping the process should be enough to close the socket.
如果调用
socket.shutdown
确实是在超时时切断数据的唯一方法,我认为您需要诉诸某种猴子修补。 urllib2 并没有真正为您提供这种细粒度套接字控制的机会。查看Python 和 urllib2 的源接口以获得一个好的方法。
If calling
socket.shutdown
really is the only way to cut off the data on timeout, I think you need to resort to some sort of monkey-patching. urllib2 doesn't really offer you the opportunity for that sort of fine-grained socket control.Check out Source interface with Python and urllib2 for a good approach.
事实证明,在正在上传的 HTTPConnection 上调用 .sock.shutdown(socket.SHUT_RDWR) 和 .close() 命令不会停止上传。它将继续在后台运行。在使用 urllib2 或 httplib 时,我不知道有更可靠/直接的方法来终止来自 Python 的连接。
最后我们测试了使用urllib2上传没有超时。这意味着在连接速度较慢的情况下,上传(POST)可能需要很长时间,但至少我们会知道它是否有效。由于没有超时,urlopen 可能会挂起,但我们已经测试了各种连接不良的可能性,并且在所有情况下 urlopen 要么工作,要么在一段时间后返回错误。
这意味着我们至少会在客户端知道上传成功或失败,并且不会在后台继续。
It turns out that calling the .sock.shutdown(socket.SHUT_RDWR) and .close() commands on a HTTPConnection that is uploading does not stop the upload. It will continue running in the background. I am not aware of more reliable/direct methods to kill the connection from Python, while using urllib2 or httplib.
In the end we tested the upload using urllib2 without the timeout. This means that on a slow connection it might take very long to do the upload (POST), but at least we will know wether it worked or not. There is a possibility that urlopen might hang because there is no timeout, but we have tested various bad-connection possibilities and in all cases the urlopen either worked or returned an error after some time.
This means that we will at least know, in the client side, that the upload succeeded or failed, and that it does not continue in the background.