应用程序运行一段时间后不停发生SocketTimeoutException,但通过重新启动即可立即解决
我有一个爬虫 Java 应用程序,它应该连接到一些 HTTP 服务器,下载其页面的 HTML 内容,然后转移到其他 HTTP 服务器。对于此任务,我使用了 Apache HTTP 库。
在运行的最初几个小时,事情似乎进展得相当顺利(不时会抛出一些与连接相关的异常,但这是可以预料的)。 然而过了一段时间,我似乎在发出的每个请求上都不断收到 SocketTimeoutException 异常。异常不会发生在 HttpClient 类的“execute”方法上,而是当我尝试获取实体的内容(从 HttpResponse 对象中检索)时,或者当我尝试将该内容写入文件时发生。
然后,如果我停止应用程序并重新启动它,事情似乎会恢复正常工作 - 尽管它从停止的地方恢复过来,这意味着它正在与我在尝试交互时收到 SocketTimeoutException 的相同服务器进行交互与之前。
我尝试寻找各种可能的清理工作,这些清理工作可能是我遗漏的,但在使用这个库时可能是必不可少的,但找不到任何东西。
任何帮助将不胜感激。 谢谢。
I have a crawler Java application which is supposed to connect to some HTTP servers, download the HTML content of their pages, then move on to other HTTP servers. For this task, I've used the Apache HTTP library.
At the first few hours of the run, things seem to work rather smoothly (there are some connection-related exceptions thrown around from time to time, but that's to be expected).
Yet after a while, it seems like I keep getting SocketTimeoutException on every request I send out. The exception does not occur on the HttpClient class's "execute" method, but rather when I try to get the content of the Entity (which I retrieve from the HttpResponse object), or when I try to write that content to a file.
Then, if I stop the application, and start it over again, things seem to go back to working fine - even though it picks up from where it stopped at, meaning it's interacting with the same servers which I received the SocketTimeoutException when trying to interact with before.
I tried looking for all kinds of possible clean-ups that I might be missing and might be essential when using this library, but couldn't find anything.
Any help would be greatly appreciated.
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这听起来像是由连接池引起的,如果在客户端库等待检索池连接时发生超时,则在完成连接池后您没有关闭它们。您确定正确关闭了所有内容(在
finally
语句中)吗?如果您运行 Wireshark 来监控您的流量,当网络“中断”时会发生什么网络流量?
This sounds like the kind of thing which could be caused by connection pools where you're not closing things when you're done with them, if the timeout occurs while the client library waits to retrieve a pooled connection. Are you sure you're closing everything properly (in
finally
statements)?If you run Wireshark to monitor your traffic, what network traffic occurs while it's "broken"?
确保您没有同时使用大量 http 请求。例如,发送 5 个 http 请求,并等待第一个响应。然后你可以发出另一个请求等。看起来你的http请求打开了太多的套接字。
Make sure that you're not using a lot of http requests at the same time. For example, send 5 http requests, and wait for first response. Then you can make another request etc. Looks like your http requests opens too much sockets.