如果 socket.setdefaulttimeout() 不起作用,我该怎么办?

发布于 2024-12-20 15:50:57 字数 871 浏览 4 评论 0原文

我正在编写一个脚本(多线程)来从网站检索内容,并且该网站不是很稳定,因此时不时地会出现挂起的 http 请求,甚至无法通过 socket.setdefaulttimeout()< /代码>。由于我无法控制该网站,我唯一能做的就是改进我的代码,但我现在没有想法了。

示例代码:

socket.setdefaulttimeout(150)

MechBrowser = mechanize.Browser()
Header = {'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 GTB7.1 (.NET CLR 3.5.30729)'}
Url = "http://example.com"
Data = "Justatest=whatever&letstry=doit"
Request = urllib2.Request(Url, Data, Header)
Response = MechBrowser.open(Request)
Response.close()

我应该怎么做才能强制退出挂起的请求?实际上我想知道为什么 socket.setdefaulttimeout(150) 一开始不起作用。有人可以帮我吗?

添加:(是的,问题仍未解决)

好的,我已按照 tomasz 的建议将代码更改为 MechBrowser.open(Request, timeout = 60),但同样的事情发生。到目前为止,我仍然随机收到挂起的请求,有时是几个小时,有时可能是几天。我现在该怎么办?有没有办法强制退出这些挂起的请求?

I'm writing a script(multi-threaded) to retrieve contents from a website, and the site's not very stable so every now and then there's hanging http request which cannot even be time-outed by socket.setdefaulttimeout(). Since I have no control over that website, the only thing I can do is to improve my codes but I'm running out of ideas right now.

Sample codes:

socket.setdefaulttimeout(150)

MechBrowser = mechanize.Browser()
Header = {'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 GTB7.1 (.NET CLR 3.5.30729)'}
Url = "http://example.com"
Data = "Justatest=whatever&letstry=doit"
Request = urllib2.Request(Url, Data, Header)
Response = MechBrowser.open(Request)
Response.close()

What should I do to force the hanging request to quit? Actually I want to know why socket.setdefaulttimeout(150) is not working in the first place. Anybody can help me out?

Added:(and yes problem still not solved)

OK, I've followed tomasz's suggestion and changed codes to MechBrowser.open(Request, timeout = 60), but same thing happens. I still got hanging requests randomly till now, sometimes it's several hours and other times it could be several days. What do I do now? Is there a way to force these hanging requests to quit?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

腻橙味 2024-12-27 15:50:57

虽然 socket.setsocketimeout 将为新套接字设置默认超时,但如果您不直接使用套接字,则可以轻松覆盖该设置。特别是,如果库在其套接字上调用 socket.setblocking,它将重置超时。

urllib2.open 有一个超时参数,但是 urllib2.Request 中没有超时。当您使用 mechanize 时,您应该参考他们的文档:

自 Python 2.6 起,urllib2 在内部使用 Request 对象的 .timeout 属性。但是,urllib2.Request 没有超时构造函数参数,并且 urllib2.urlopen() 忽略此参数。 mechanize.Request 有一个超时构造函数参数,用于设置同名属性,mechanize.urlopen() 不会忽略超时属性。

来源:http://wwwsearch.sourceforge.net/mechanize/documentation.html

---编辑---

如果 socket.setsockettimeout 或将超时传递给 mechanize 可以使用较小的值,但不能如果更高,问题的根源可能是完全不同。有一件事是你的库可能会打开多个连接(这里归功于@Cédric Julien),因此超时适用于每次尝试socket.open,如果它没有因第一次失败而停止 - 可能需要超时* num_of_conn 秒。另一件事是 socket.recv:如果连接真的很慢并且您足够不幸,则整个请求可能会花费 timeout *coming_bytes ,就像每个 socket.recv 我们可以获取一个字节,并且每个此类调用都可能需要 timeout 秒。由于您不太可能遭受这种黑暗的场景(每个超时秒一个字节?您必须是一个非常粗鲁的男孩),因此很可能需要花费很长时间才能获得非常慢的连接和非常高的超时。

您唯一的解决方案是强制整个请求超时,但这里与套接字无关。如果您使用的是 Unix,则可以使用带有 ALARM 信号的简单解决方案。您将信号设置为在 timeout 秒内发出,您的请求将被终止(不要忘记捕获它)。您可能想使用 with 语句来使其干净且易于使用,例如:

import signal, time

def request(arg):
  """Your http request"""
  time.sleep(2)
  return arg

class Timeout():
  """Timeout class using ALARM signal"""
  class Timeout(Exception): pass

  def __init__(self, sec):
    self.sec = sec

  def __enter__(self):
    signal.signal(signal.SIGALRM, self.raise_timeout)
    signal.alarm(self.sec)

  def __exit__(self, *args):
    signal.alarm(0) # disable alarm

  def raise_timeout(self, *args):
    raise Timeout.Timeout()

# Run block of code with timeouts
try:
  with Timeout(3):
    print request("Request 1")
  with Timeout(1):
    print request("Request 2")
except Timeout.Timeout:
  print "Timeout"

# Prints "Request 1" and "Timeout"

如果想要比这更便携,您必须使用一些更大的枪,例如 multiprocessing,因此您将生成一个进程来调用您的请求并在逾期时终止它。由于这将是一个单独的过程,因此您必须使用某些东西将结果传输回您的应用程序,它可能是 multiprocessing.Pipe。例子如下:

from multiprocessing import Process, Pipe
import time

def request(sleep, result):
  """Your http request example"""
  time.sleep(sleep)
  return result

class TimeoutWrapper():
  """Timeout wrapper using separate process"""
  def __init__(self, func, timeout):
    self.func = func
    self.timeout = timeout

  def __call__(self, *args, **kargs):
    """Run func with timeout"""
    def pmain(pipe, func, args, kargs):
      """Function to be called in separate process"""
      result = func(*args, **kargs) # call func with passed arguments
      pipe.send(result) # send result to pipe

    parent_pipe, child_pipe = Pipe() # Pipe for retrieving result of func
    p = Process(target=pmain, args=(child_pipe, self.func, args, kargs))
    p.start()
    p.join(self.timeout) # wait for prcoess to end

    if p.is_alive():
      p.terminate() # Timeout, kill
      return None # or raise exception if None is acceptable result
    else:          
      return parent_pipe.recv() # OK, get result

print TimeoutWrapper(request, 3)(1, "OK") # prints OK
print TimeoutWrapper(request, 1)(2, "Timeout") # prints None

如果你想强制请求在固定的秒数后终止,你真的没有太多选择。 socket.timeout 将为单个套接字操作(connect/recv/send)提供超时,但如果您有多个套接字,您可能会遭受很长的执行时间。

While socket.setsocketimeout will set the default timeout for new sockets, if you're not using the sockets directly, the setting can be easily overwritten. In particular, if the library calls socket.setblocking on its socket, it'll reset the timeout.

urllib2.open has a timeout argument, hovewer, there is no timeout in urllib2.Request. As you're using mechanize, you should refer to their documentation:

Since Python 2.6, urllib2 uses a .timeout attribute on Request objects internally. However, urllib2.Request has no timeout constructor argument, and urllib2.urlopen() ignores this parameter. mechanize.Request has a timeout constructor argument which is used to set the attribute of the same name, and mechanize.urlopen() does not ignore the timeout attribute.

source: http://wwwsearch.sourceforge.net/mechanize/documentation.html

---EDIT---

If either socket.setsockettimeout or passing timeout to mechanize works with small values, but not with higher, the source of the problem might be completely different. One thing is your library may open multiple connections (here credit to @Cédric Julien), so the timeout apply to every single attempt of socket.open and if it doesn't stop with first failure – can take up to timeout * num_of_conn seconds. The other thing is socket.recv: if the connection is really slow and you're unlucky enough, the whole request can take up to timeout * incoming_bytes as with every socket.recv we could get one byte, and every such call could take timeout seconds. As you're unlikely to suffer from exactly this dark scenerio (one byte per timeout seconds? you would have to be a very rude boy), it's very likely request to take ages for very slow connections and very high timeouts.

The only solution you have is to force timeout for the whole request, but there's nothing to do with sockets here. If you're on Unix, you can use simple solution with ALARM signal. You set the signal to be raised in timeout seconds, and your request will be terminated (don't forget to catch it). You might like to use with statement to make it clean and easy for use, example:

import signal, time

def request(arg):
  """Your http request"""
  time.sleep(2)
  return arg

class Timeout():
  """Timeout class using ALARM signal"""
  class Timeout(Exception): pass

  def __init__(self, sec):
    self.sec = sec

  def __enter__(self):
    signal.signal(signal.SIGALRM, self.raise_timeout)
    signal.alarm(self.sec)

  def __exit__(self, *args):
    signal.alarm(0) # disable alarm

  def raise_timeout(self, *args):
    raise Timeout.Timeout()

# Run block of code with timeouts
try:
  with Timeout(3):
    print request("Request 1")
  with Timeout(1):
    print request("Request 2")
except Timeout.Timeout:
  print "Timeout"

# Prints "Request 1" and "Timeout"

If want to be more portable than this, you have to use some bigger guns, for example multiprocessing, so you'll spawn a process to call your request and terminate it if overdue. As this would be a separate process, you have to use something to transfer the result back to your application, it might be multiprocessing.Pipe. Here comes the example:

from multiprocessing import Process, Pipe
import time

def request(sleep, result):
  """Your http request example"""
  time.sleep(sleep)
  return result

class TimeoutWrapper():
  """Timeout wrapper using separate process"""
  def __init__(self, func, timeout):
    self.func = func
    self.timeout = timeout

  def __call__(self, *args, **kargs):
    """Run func with timeout"""
    def pmain(pipe, func, args, kargs):
      """Function to be called in separate process"""
      result = func(*args, **kargs) # call func with passed arguments
      pipe.send(result) # send result to pipe

    parent_pipe, child_pipe = Pipe() # Pipe for retrieving result of func
    p = Process(target=pmain, args=(child_pipe, self.func, args, kargs))
    p.start()
    p.join(self.timeout) # wait for prcoess to end

    if p.is_alive():
      p.terminate() # Timeout, kill
      return None # or raise exception if None is acceptable result
    else:          
      return parent_pipe.recv() # OK, get result

print TimeoutWrapper(request, 3)(1, "OK") # prints OK
print TimeoutWrapper(request, 1)(2, "Timeout") # prints None

You really don't have much choice if you want to force the request to terminate after fixed number of seconds. socket.timeout will provide timeout for single socket operation (connect/recv/send), but if you have multiple of them you can suffer from very long execution time.

天暗了我发光 2024-12-27 15:50:57

从他们的文档中:

自 Python 2.6 起,urllib2 在 Request 对象上使用 .timeout 属性
内部。但是,urllib2.Request没有超时构造函数
参数,并且 urllib2.urlopen() 忽略此参数。
mechanize.Request 有一个超时构造函数参数,用于
设置同名属性,而 mechanize.urlopen() 则没有
忽略超时属性。

也许您应该尝试将 urllib2.Request 替换为 mechanize.Request。

From their documentation:

Since Python 2.6, urllib2 uses a .timeout attribute on Request objects
internally. However, urllib2.Request has no timeout constructor
argument, and urllib2.urlopen() ignores this parameter.
mechanize.Request has a timeout constructor argument which is used to
set the attribute of the same name, and mechanize.urlopen() does not
ignore the timeout attribute.

Perhaps you should try replacing urllib2.Request with mechanize.Request.

甜味拾荒者 2024-12-27 15:50:57

您可以尝试使用 mechanize 与 eventlet。它不能解决你的超时问题,但 greenlet 是非阻塞的,所以它可以解决你的性能问题。

You could try to use mechanize with eventlet. It does not solve your timeout problem, but greenlet are non blocking, so it can solve your performance problem.

梨涡 2024-12-27 15:50:57

我建议一个简单的解决方法 - 将请求移动到不同的进程,如果它无法终止,则从调用进程中杀死它,这样:

    checker = Process(target=yourFunction, args=(some_queue))
    timeout = 150
    checker.start()
    counter = 0
    while checker.is_alive() == True:
            time.sleep(1)
            counter += 1
            if counter > timeout :
                    print "Son process consumed too much run-time. Going to kill it!"
                    kill(checker.pid)
                    break

简单、快速且有效。

I suggest a simple workaround - move the request to a different process and if it fails to terminate kill it from the calling process, this way:

    checker = Process(target=yourFunction, args=(some_queue))
    timeout = 150
    checker.start()
    counter = 0
    while checker.is_alive() == True:
            time.sleep(1)
            counter += 1
            if counter > timeout :
                    print "Son process consumed too much run-time. Going to kill it!"
                    kill(checker.pid)
                    break

simple, fast and effective.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文