对卡在 CLOSE_WAIT 状态的连接进行故障排除
我有一个在 Windows 上的 WebLogic 11g 中运行的 Java 应用程序,几天后该应用程序变得没有响应。我注意到的一个可疑症状是,即使服务器处于空闲状态,netstat
中也会出现大量处于 CLOSE_WAIT 状态的连接(大约 3000 个)。由于应用程序服务器正在管理客户端连接,因此我不确定导致此问题的原因。我们还进行了许多环回同一服务器的 Web 服务调用,但我相信这些连接已正确关闭。还有什么可能导致此问题以及如何解决此类问题?
I have a Java application running in WebLogic 11g on Windows, which after several days, becomes unresponsive. One suspicious symptom I've noticed is that a large number of connections (about 3000) show up in netstat
with a CLOSE_WAIT status even when the server is idle. Since the application server is managing the client connections, I'm not sure what's causing this. We also make a number of web service calls that loopback to the same server, but I believe those connections get closed properly. What else could cause this and how does one troubleshoot a problem like this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
CLOSE_WAIT
是当远程主机发送 FIN(关闭其连接)但本地应用程序未执行相同操作并发送回复 FIN 时本地 TCP 状态机所处的状态。尽管客户端无法接收数据,但本地计算机此时仍然可以发送数据(除非它仅对连接进行了半关闭)。当远程主机关闭(发送 FIN)时,您的本地应用程序将收到某种事件(这是基础 C 库中套接字上的“读取”事件),但从该连接读取将返回错误,以指示连接已关闭。此时本地应用程序应该关闭连接。
我对 Java 知之甚少,对 WebLogic 一无所知,但我认为应用程序可能没有正确处理读取错误,因此永远不会关闭连接。
CLOSE_WAIT
is the state the local TCP state machine is in when the remote host sends a FIN (closes it's connection) but the local application has not done the same and sent a reply FIN. It's still possible for the local machine to send data at this point though the client cannot receive it (unless it did only a half-close on the connection).When the remote host closes (sends a FIN), your local application will get an event of some sort (it's a "read" event on the socket in the base C library) but reading from that connection will return an error to indicate that the connection has closed. At this point the local application should close the connection.
I know little about Java and nothing about WebLogic but I suppose it's possible that the application is not handling the read error properly and thus never closing the connection.
我也遇到了同样的问题,我一直在研究套接字来解决这个问题。
让我说几句话,但在之前我必须说我不是 Java 程序员。
我不会解释 close_wait 是什么,因为 Brian White 已经说了该说的一切。
为了避免 close_wait,您需要确保服务器在发回响应后不会关闭连接,因为无论谁先断开连接都会陷入 close_wait 和 time_wait 状态。因此,如果您的服务器陷入 close_wait 状态,它会告诉我它在发送响应后正在断开连接。
您应该通过做一些事情来避免这种情况。
1 - 如果您的客户端应用程序未使用 http 1.1 协议,则必须将其设置为使用该协议,因为
'keep-alive
http 标头选项。2 - 如果您的客户端运行的是 http 1.1 并且不起作用,或者,如果您必须使用 http 1.0,则应该设置连接请求标头属性:
这告诉服务器客户端和服务器在完成请求后都不应断开连接。通过这样做,您的服务器将不会在收到每个请求后断开连接。
3 - 在您的客户端中,重用您的套接字。例如,如果您在循环中创建大量套接字客户端,则应该创建一个套接字一次,然后每次需要发送请求时它们都会使用它。我在应用程序中使用的方法是拥有一个套接字池并获取一个可用的套接字(该套接字已连接到服务器并且具有保持活动状态的属性)。然后我使用它,完成后我将它放回池中以供重复使用。
4 - 如果您确实需要在发送请求后断开连接,请确保您的客户端执行此操作并保持
连接:保持活动状态
。是的,当服务器端有大量 close_wait 或 time_wait 时,您可能会遇到问题。
查看此[链接][1],它解释了
keep-alive
是什么。我希望这有帮助。通过这些事情我设法解决了我的问题。
[1]: http:// /www.w3.org/Protocols/HTTP/1.1/draft-ietf-http-v11-spec-01.html#持久 连接
I have been having the same issue and I have been studying sockets to get rid of this issue.
Let me say a few words, but before i must say I am not a Java programmer.
I will not explain what close_wait is, as Brian White already said everything that should be said.
To avoid close_wait, you need to make sure your server does not close the connection after it sends back the response because whomever disconnects first get stuck in close_wait and time_wait. So, if your server is getting stuck in close_wait it tells me that it is disconnecting after it sends the response.
You should avoid that by doing a few things.
1 - If your client application is not using the http 1.1 protocol you must set it to use that because of the
'keep-alive
http header option.2 - If you client is running http 1.1 and that does not work, or, if you must use http 1.0, you should set the connection request header property:
This tells the server that neither the client nor the server should disconnect after completing a request. By doing that your server will not disconnect after every request it receives.
3 - In your client, reuse your socket. If you are creating a lot of sockets clients in a loop for example, you should create a socket once and them use it every time you need to send a request. The approach I used in my app is to have a socket pool and get one socket available (which is already connected to the server and it has the keep-alive property). Then I use it and when i am done I put it back in the pool to be reusable.
4 - If you really need to disconnect after sending a request, make sure your client does that and keep the
connection: keep-alive
.And yes, you may have problems when you have a lot of close_waits or time_waits on the server side.
Check out this [link][1] which explain what
keep-alive
is.I hope this was helpful. With those things I managed to solve my problem.
[1]: http://www.w3.org/Protocols/HTTP/1.1/draft-ietf-http-v11-spec-01.html#Persistent Connections
CLOSE_WAIT
状态表示对方已发起连接关闭,但本端应用程序尚未关闭套接字。听起来您的本地应用程序有一个错误。
The
CLOSE_WAIT
status means that the other side has initiated a connection close, but the application on the local side has not yet closed the socket.It sounds like you have a bug in your local application.
我发现了关于 CLOSE_WAIT 堆积的引用:“有些东西要么阻止进展
发生在 HTTP 会话中(我们被卡住了,所以永远不会调用 close),或者引入了一些阻止套接字关闭的错误。发生这种情况的方式有很多种。”
想一想:您的应用程序在处理请求时是否会卡住?或者 WebLogic 本身?
检查:您可以执行 Java 线程转储吗(kill -SIGQUIT 可以用于此目的) Oracle JVM for Linux)尝试查看是否有任何线程被卡住?
检查客户端:首先,找出连接到 CLOSE_WAIT 套接字的客户端的 IP 地址或主机名,然后查看是否有。这些客户身上发生了任何可疑的事情。
I found this quote about CLOSE_WAIT pileups: "Something is either preventing progress to
occur in the HTTP session (we are stuck so never end up calling close), or some bug has been introduced that prevents the socket from being closed. There are a number of ways this can happen."
Think: Is there any way your application might be getting stuck while processing a request? Or WebLogic itself?
Examine: Can you do Java thread dumps (kill -SIGQUIT can be used for that on the Oracle JVM for Linux) to try to see if in fact any of your threads ARE getting stuck?
Examine the client side: First, find out the IP address or hostname of the clients that are connected to the CLOSE_WAIT sockets. Then, see if anything suspicious is happening on those clients.
该问题是在 webLogic 中将“使用 JSSE SSL”设置为 true 时触发的错误。使用 WebLogic 自己的 SSL 实现而不是 JSSE 对于我们的应用程序来说不是问题,因此我只是取消选中该设置,问题就消失了。
The problem was a bug triggered by setting "Use JSSE SSL" to true in webLogic. Using WebLogic's own SSL implementation instead of JSSE is not a problem for our application, so I merely unchecked that setting and the problem disappeared.
这可能意味着您没有通过accept() 调用在套接字上调用“close”。
This might mean that you're not calling "close" on a socket from your accept() call.