如何检测无效的 fd/句柄

发布于 2024-12-17 09:26:59 字数 543 浏览 4 评论 0原文

我有一个服务器应用程序,它使用异步 I/O 处理网络客户端。客户端连接被接受,然后添加到描述符集中,可以使用 poll/epoll/select/etc 来监视该描述符集。我正在使用 apr_pollset_poll() apache APR 库调用来检查可以读取或写入的描述符。这在内部使用 epoll/poll/select/etc,具体取决于平台。

问题是,不知何故,套接字描述符之一被损坏,并且 apr_pollset_poll 返回 errno 10038,即 WSAENOTSOCK:尝试对非套接字的内容进行操作。不幸的是,这导致我的应用程序完全停止工作,而不仅仅是能够踢出特定的客户端连接。 如果我可以以某种方式忽略或从描述符集中删除此套接字,那么它可以继续运行并正确读取/写入其他套接字。我知道我应该找到导致套接字损坏的根本原因,但我需要一个故障安全解决方法。

一旦描述符被添加到轮询集中,这些描述符就会由操作系统/内核处理,我看不出有什么方法可以检索它们以进行迭代。在我自己的列表中维护这些可能会进一步产生其他问题,因为在套接字关闭时,我需要以某种方式清理它们,这对于内核轮询集会自动发生。

有什么建议吗?

I have a server application which handles network clients with an async i/o. The client connections are accepted then added to a descriptor set which can be monitored with poll/epoll/select/etc. I'm using the apr_pollset_poll() apache APR library call to check for descriptors which can be read or written to. This uses epoll/poll/select/etc internally depending on the platform.

Problem is that somehow one of the socket descriptors gets corrupt and the apr_pollset_poll returns errno 10038 which is WSAENOTSOCK: An operation was attempted on something that is not a socket. Unfortunately this causes my application to stop working at all instead of just being able to kick that particular client connection.
If I could somehow ignore or remove this socket from the descriptor set, then it could continue to function and properly read/write the other sockets. I know I should find the root cause which causes the socket to go corrupt, but I need a failsafe workaround.

Once the descriptors are added to the pollset, these are then handled by the OS/kernel and I see no way of retrieving them to be able to iterate on. Maintaining these also in my own list would probably create other problems further down, because on socket close I would need to clean them up somehow which occurs automatically for the in-kernel pollset.

Any suggestions?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

我的影子我的梦 2024-12-24 09:26:59

这听起来很可怕,但一旦发生就是紧急情况。因此,我建议检查工作轮询集中的所有描述符,并尝试对该描述符执行操作,如果描述符是假的,则会触发该错误。例如,您可以创建一个新的临时轮询集并尝试非阻塞零超时轮询操作,然后查看是否可以收到错误。

如果您的民意调查集中有超过十几个描述符,您可能会考虑使用二分搜索而不是一次一个的方法。您可以将一半的描述符放入临时轮询集中,然后执行操作。如果失败,您就知道在您尝试的集合中有一个伪造的描述符;分成两部分并重试;如果它没有失败,您可以假设伪造的描述符位于另一组中,并且您可以验证另一半失败或假设它会失败并将其余部分分成两部分,然后重试。继续下去,直到隔离出一个失败的描述符。显然,如果您有多个虚假描述符而不仅仅是一个,则可能需要重复该过程几次。

隔离一个描述符后,您就可以决定需要对它做什么以及如何做。如果/当问题再次出现时,您可以重复隔离过程。显然,除非您首先发现问题,否则您不会尝试此操作。但是,当出现问题时,您需要隔离问题,这将(应该)实现这一目标。

It sounds dire, but it is an emergency situation when it occurs. So, I suggest going through all the descriptors in your working pollset, and trying to do an operation on that descriptor that will trigger that error if the descriptor is bogus. For example, you could create a new, temporary pollset and try a non-blocking zero timeout poll operation and see whether you can get the error.

If you've got more than, say, a dozen descriptors in your pollset, you might consider a binary search instead of a one-at-a-time approach. You could put half your descriptors into the temporary pollset, and then do the operation. If it fails, you know you've got a bogus descriptor in the set you tried; divide in two and try again; if it does not fail, you can presume the bogus descriptor is in the other set, and you can either validate that the other half fails or assume it will and split the remainder in two and try again. Keep going until you've isolated the one failing descriptor. Clearly, if you have several bogus descriptors rather than just one, you may have to repeat the process a few times.

With the one descriptor isolated, you can decide what you need to do about it and how. And if/when the problem recurs, you can repeat the isolation process. Clearly, you wouldn't try this unless you detected the problem in the first place. But when things are going wrong, you need to isolate the problem, and this would (should) achieve that.

滥情空心 2024-12-24 09:26:59

事实证明,我正在对另一个线程中正在轮询的套接字描述符执行 close(),而基于 select() 的 pollset 实现不喜欢这样。
另一方面,可以修改 apr 库代码以在 select 检测到无效套接字时返回描述符,或者甚至可以自动删除它。

It turned out that I was doing a close() on a socket descriptor which was being polled in another thread and the pollset implementation based on select() does not like this.
On the other hand, it would be possible to modify apr library code to return the descriptor when select detects an invalid socket, or it could even remove it automatically.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文