在 Python 中,如何对有时挂起的函数调用强制超时?
我正在使用 Python 蜘蛛通过 urllib2 OpenerDirector 来抓取互联网。问题是连接将不可避免地挂在 https 地址上,显然忽略了超时值。
一种解决方案是在线程中运行它,然后在线程挂起时终止并重新启动该线程。显然,Python 不支持终止线程,并且由于垃圾收集和其他问题,它被认为是一个坏主意。然而,由于简单,这个解决方案对我来说更可取。
另一个想法是使用像 Twisted 这样的异步库,但这并不能解决问题。
我要么需要一种方法来强制中断调用,要么修复 urllib2 OpenerDirector 处理超时的方式。谢谢。
I'm using a Python spider to crawl the internet using a urllib2 OpenerDirector. The problem is that a connection will inevitably hang on an https address, apparently ignoring the timeout value.
One solution would be to run it in a thread and then kill and restart the thread if it hangs. Apparently Python doesn't support killing threads and it's considered a Bad Idea because of garbage collection and other issues. This solution would be preferable to me however, because of the simplicity.
Another idea would be to use an asynchronous library like Twisted but that doesn't solve the problem.
I either need a way to force interrupt the call or fix the way the urllib2 OpenerDirector handles timeouts. Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
另一个 StackOverflow 问题与此处类似。当我遇到类似的事情时,我发现将我正在做的事情转换为定义和定义更容易。调用函数,该函数随后可以在发生超时事件时返回一个值。这实际上可以通过利用各种返回值来开辟更多可能性。
我上面链接到的相关问题的另一个答案听起来更像是您正在寻找的内容(据我所知): https:// /stackoverflow.com/a/5817436/1118357
Another StackOverflow question is similar here. When I faced something similar, I found it easier to convert what I was doing into defining & calling functions, which can subsequently return a value upon a timeout event. This can actually open up more possibility by utilizing various return values.
Another answer to the related question that I linked to above sounds more like what you're looking for (as I understand it): https://stackoverflow.com/a/5817436/1118357
我建议使用另一个进程而不是线程。像这样:
这样无论发生什么,子进程都会在 150 秒后被终止。
I suggest using another process instead of threads. like this:
this way whatever happens the son process is killed after 150 seconds.