如何在 C 中调试不干净的套接字关闭?
我有一个网络守护进程(poll()/accept()/fork() 风格),它正在泄漏套接字文件描述符,每个处于 TIME_WAIT 状态的客户端都有一个。
据我所知,我可以 shutdown()ing 然后 close()ing 绝对不再需要的套接字。其他套接字(例如分叉客户端中的服务器套接字)只需 close()ed。所有套接字都设置了 SO_REUSEADDR 并且 SO_LINGER 关闭。我正在使用 _exit() 退出程序,并且我正在使用非阻塞轮询套接字操作,以便在我的信号处理程序中设置一个“垂死”标志 - 这允许我稍后拾取垂死标志并释放( )、shutdown()、close(),否则这在信号处理程序中会很危险。
但仍然是 fd 泄漏 - 调试此类问题的最佳方法是什么?知道哪个套接字在退出时徘徊会有所帮助,因为该过程中涉及许多文件描述符。
干杯!
I have a network daemon (poll()/accept()/fork() style) which is leaking socket file descriptors, one per client in the TIME_WAIT state.
As far as I can see I can shutdown()ing and then close()ing definitely-no-longer-needed sockets. Other sockets (for example the server socket in the client side of the fork) are just close()ed. All sockets have SO_REUSEADDR set and SO_LINGER is off. I am using _exit() to exit the program and I am using non-blocking polling socket operations so as to set a ''dying'' flag in my signal handler -- this allows me to later pick up the dying flag and free(), shutdown(), close(), which would otherwise be dangerous in a signal handler.
But still a fd leak -- What is the best way to debug this kind of problem? It would help to know which socket is loitering at exit, as there are many fds involved in the process.
Cheers!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
处于 TIME_WAIT 模式的套接字不会泄漏 - TIME_WAIT 意味着应用程序已完成套接字并已关闭它并清理它,但内核仍然记住该套接字,以便正确响应可能会出现的延迟/孤立/重复数据包就在网络中流传。过了一会儿,内核将自动删除 TIME_WAIT 套接字,但在此之前,它们仍然会提醒内核不要重用该端口,除非应用程序通过 SO_REUSEADDR 专门请求它。
Sockets in TIME_WAIT mode are NOT leaking -- TIME_WAIT means that the application has finished with the socket and has closed it and cleaned it up, but the kernel is still remembering the socket so as to respond properly to late/orphan/duplicate packets that might be floating around in the network. After a little while, the kernel will automatically delete the TIME_WAIT sockets, but until then, they remain as a reminder to the kernel to not reuse the port unless an app specifically asks for it with SO_REUSEADDR.
我明白了这一点。
事实上,我已经通过关闭 fork 的服务器端的 cli_fd 修复了该错误;但是我没有注意到错误已修复,因为我错误地使用 natstat 来打开 fds。
作为记录,netstat -n | 的输出grep TIME_WAIT | 时间等待wc -l 不应该用于计算挂起的套接字的文件描述符——这就是我做错的事情。请改用 lsof 或 fstat。
无论如何 - 服务器在相当大的负载下不再耗尽 fds。
干杯
I figured this out.
Infact I had fixed the bug already by closing the cli_fd in the server side of the fork; however I did not notice the bug was fixed because i was using natstat wrongly to could open fds.
For the record, the output of
netstat -n | grep TIME_WAIT | wc -l
should not be used to count file descriptors for sockets which are hanging around -- this is what i was doing wrong. Use lsof or fstat instead.Anyway - the server is no longer running out of fds under considerable load.
Cheers