Linux 内核在进程死亡后在哪里进行进程和 TCP 连接清理?
我试图在 Linux 内核中找到一个在进程终止后进行清理的位置。具体来说,我想看看在进程被 -9 信号杀死后它是否/如何处理打开的 TCP 连接。我很确定它会关闭所有连接,但我想查看详细信息,以及是否有可能连接未正确关闭。
欢迎指向 Linux 内核源代码。
I am trying to find place in the linux kernel where it does cleanup after process dies. Specifically, I want to see if/how it handles open TCP connections after process is killed with -9 signal. I am pretty sure it closes all connections, but I want to see details, and if there is any chance that connections are not closed properly.
Pointers to linux kernel sources are welcome.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
进程终止的主要内容由
exit.c 处理:do_exit()
。此函数调用exit_files()
,后者又调用put_files_struct()
,后者又调用close_files()
。close_files()
循环遍历进程打开的所有文件描述符(包括所有套接字),对每个文件描述符调用 filp_close(),从而对 struct file 调用fput()
对象。当对struct file
的最后一个引用被放置后,fput()
调用文件对象的.release()
方法,对于套接字,是net/socket.c
中的sock_close()
函数。The meat of process termination is handled by
exit.c:do_exit()
. This function callsexit_files()
, which in turn callsput_files_struct()
, which callsclose_files()
.close_files()
loops over all file descriptors the process has open (which includes all sockets), callingfilp_close()
on each one, which callsfput()
on thestruct file
object. When the last reference to thestruct file
has been put,fput()
calls the file object's.release()
method, which for sockets, is thesock_close()
function innet/socket.c
.我很确定套接字清理更多的是进程终止后释放所有文件描述符的副作用,而不是直接由进程清理完成。
不过,我还是要冒险一下,假设您遇到了网络编程的常见陷阱。如果我正确地猜测您的问题是在进程被终止后尝试绑定到地址时出现“地址正在使用”错误(EADDRINUSE),那么您将遇到套接字的 TIME_WAIT。
如果是这种情况,您可以等待超时(通常为 60 秒),也可以修改套接字以允许立即重用,如下所示。
[编辑]
从您的评论来看,听起来您遇到了半开放连接的问题,并且不完全理解 TCP 的工作原理。 TCP 无法知道客户端是死了还是闲置。如果您
kill -9
一个客户端进程,四次关闭握手将永远不会完成。但这不应该在您的服务器上留下开放的连接,因此您可能仍然需要获取网络转储来确定发生了什么。在不确切知道自己在做什么的情况下,我无法确定您应该如何处理这个问题,但是您可以阅读有关 TCP 保持活动状态。其他几个选项是定期向客户端发送空或空消息(可能需要修改协议),或在空闲连接上设置硬定时器(可能会导致有效连接丢失)。
I'm pretty sure the socket cleanup is more of a side effect of releasing all the file descriptors after the process dies, and not directly done by the process cleanup.
I'm going to go out on a limb though, and assume you're hitting a common pitfall with network programming. If I am correct in guessing that your problem is that you get an "Address in use" error (EADDRINUSE) when trying to bind to an address after a process is killed, then you are running into the socket's TIME_WAIT.
If this is the case, you can either wait for the timeout, usually 60 seconds, or you can modify the socket to allow immediate reuse like so.
[EDIT]
From your comments, It sounds like you are having issues with half-open connections, and don't fully understand how TCP works. TCP has no way of knowing if a client is dead, or just idle. If you
kill -9
a client process, the four-way closing handshake never completes. This shouldn't be leaving open connections on your server though, so you still may need to get a network dump to be sure of what's going on.I can't say for sure how you should handle this without knowing exactly what you are doing, but you can read about TCP Keepalive here. A couple other options are sending empty or null messages periodically to the client (may require modifying your protocol), or setting hard timers on idle connections (may result in dropped valid connections).