如果 socket.setdefaulttimeout() 不起作用,我该怎么办?
我正在编写一个脚本(多线程)来从网站检索内容,并且该网站不是很稳定,因此时不时地会出现挂起的 http 请求,甚至无法通过 socket.setdefaulttimeout()< /代码>。由于我无法控制该网站,我唯一能做的就是改进我的代码,但我现在没有想法了。
示例代码:
socket.setdefaulttimeout(150)
MechBrowser = mechanize.Browser()
Header = {'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 GTB7.1 (.NET CLR 3.5.30729)'}
Url = "http://example.com"
Data = "Justatest=whatever&letstry=doit"
Request = urllib2.Request(Url, Data, Header)
Response = MechBrowser.open(Request)
Response.close()
我应该怎么做才能强制退出挂起的请求?实际上我想知道为什么 socket.setdefaulttimeout(150)
一开始不起作用。有人可以帮我吗?
添加:(是的,问题仍未解决)
好的,我已按照 tomasz 的建议将代码更改为 MechBrowser.open(Request, timeout = 60)
,但同样的事情发生。到目前为止,我仍然随机收到挂起的请求,有时是几个小时,有时可能是几天。我现在该怎么办?有没有办法强制退出这些挂起的请求?
I'm writing a script(multi-threaded) to retrieve contents from a website, and the site's not very stable so every now and then there's hanging http request which cannot even be time-outed by socket.setdefaulttimeout()
. Since I have no control over that website, the only thing I can do is to improve my codes but I'm running out of ideas right now.
Sample codes:
socket.setdefaulttimeout(150)
MechBrowser = mechanize.Browser()
Header = {'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 GTB7.1 (.NET CLR 3.5.30729)'}
Url = "http://example.com"
Data = "Justatest=whatever&letstry=doit"
Request = urllib2.Request(Url, Data, Header)
Response = MechBrowser.open(Request)
Response.close()
What should I do to force the hanging request to quit? Actually I want to know why socket.setdefaulttimeout(150)
is not working in the first place. Anybody can help me out?
Added:(and yes problem still not solved)
OK, I've followed tomasz's suggestion and changed codes to MechBrowser.open(Request, timeout = 60)
, but same thing happens. I still got hanging requests randomly till now, sometimes it's several hours and other times it could be several days. What do I do now? Is there a way to force these hanging requests to quit?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
虽然
socket.setsocketimeout
将为新套接字设置默认超时,但如果您不直接使用套接字,则可以轻松覆盖该设置。特别是,如果库在其套接字上调用socket.setblocking
,它将重置超时。urllib2.open
有一个超时参数,但是urllib2.Request
中没有超时。当您使用mechanize
时,您应该参考他们的文档:来源:http://wwwsearch.sourceforge.net/mechanize/documentation.html
---编辑---
如果
socket.setsockettimeout
或将超时传递给mechanize
可以使用较小的值,但不能如果更高,问题的根源可能是完全不同。有一件事是你的库可能会打开多个连接(这里归功于@Cédric Julien),因此超时适用于每次尝试socket.open,如果它没有因第一次失败而停止 - 可能需要超时* num_of_conn 秒。另一件事是socket.recv
:如果连接真的很慢并且您足够不幸,则整个请求可能会花费timeout *coming_bytes
,就像每个socket.recv
我们可以获取一个字节,并且每个此类调用都可能需要timeout
秒。由于您不太可能遭受这种黑暗的场景(每个超时秒一个字节?您必须是一个非常粗鲁的男孩),因此很可能需要花费很长时间才能获得非常慢的连接和非常高的超时。您唯一的解决方案是强制整个请求超时,但这里与套接字无关。如果您使用的是 Unix,则可以使用带有
ALARM
信号的简单解决方案。您将信号设置为在timeout
秒内发出,您的请求将被终止(不要忘记捕获它)。您可能想使用with
语句来使其干净且易于使用,例如:如果想要比这更便携,您必须使用一些更大的枪,例如
multiprocessing,因此您将生成一个进程来调用您的请求并在逾期时终止它。由于这将是一个单独的过程,因此您必须使用某些东西将结果传输回您的应用程序,它可能是
multiprocessing.Pipe
。例子如下:如果你想强制请求在固定的秒数后终止,你真的没有太多选择。
socket.timeout
将为单个套接字操作(connect/recv/send)提供超时,但如果您有多个套接字,您可能会遭受很长的执行时间。While
socket.setsocketimeout
will set the default timeout for new sockets, if you're not using the sockets directly, the setting can be easily overwritten. In particular, if the library callssocket.setblocking
on its socket, it'll reset the timeout.urllib2.open
has a timeout argument, hovewer, there is no timeout inurllib2.Request
. As you're usingmechanize
, you should refer to their documentation:source: http://wwwsearch.sourceforge.net/mechanize/documentation.html
---EDIT---
If either
socket.setsockettimeout
or passing timeout tomechanize
works with small values, but not with higher, the source of the problem might be completely different. One thing is your library may open multiple connections (here credit to @Cédric Julien), so the timeout apply to every single attempt of socket.open and if it doesn't stop with first failure – can take up totimeout * num_of_conn
seconds. The other thing issocket.recv
: if the connection is really slow and you're unlucky enough, the whole request can take up totimeout * incoming_bytes
as with everysocket.recv
we could get one byte, and every such call could taketimeout
seconds. As you're unlikely to suffer from exactly this dark scenerio (one byte per timeout seconds? you would have to be a very rude boy), it's very likely request to take ages for very slow connections and very high timeouts.The only solution you have is to force timeout for the whole request, but there's nothing to do with sockets here. If you're on Unix, you can use simple solution with
ALARM
signal. You set the signal to be raised intimeout
seconds, and your request will be terminated (don't forget to catch it). You might like to usewith
statement to make it clean and easy for use, example:If want to be more portable than this, you have to use some bigger guns, for example
multiprocessing
, so you'll spawn a process to call your request and terminate it if overdue. As this would be a separate process, you have to use something to transfer the result back to your application, it might bemultiprocessing.Pipe
. Here comes the example:You really don't have much choice if you want to force the request to terminate after fixed number of seconds.
socket.timeout
will provide timeout for single socket operation (connect/recv/send), but if you have multiple of them you can suffer from very long execution time.从他们的文档中:
也许您应该尝试将 urllib2.Request 替换为 mechanize.Request。
From their documentation:
Perhaps you should try replacing urllib2.Request with mechanize.Request.
您可以尝试使用 mechanize 与 eventlet。它不能解决你的超时问题,但 greenlet 是非阻塞的,所以它可以解决你的性能问题。
You could try to use mechanize with eventlet. It does not solve your timeout problem, but greenlet are non blocking, so it can solve your performance problem.
我建议一个简单的解决方法 - 将请求移动到不同的进程,如果它无法终止,则从调用进程中杀死它,这样:
简单、快速且有效。
I suggest a simple workaround - move the request to a different process and if it fails to terminate kill it from the calling process, this way:
simple, fast and effective.