什么会导致自发的 EPIPE 错误而没有结束调用 close() 或崩溃?

发布于 2024-08-20 17:44:48 字数 784 浏览 3 评论 0原文

我有一个由两个进程(我们称之为 A 和 B)组成的应用程序,它们通过 Unix 域套接字相互连接。大多数情况下它工作正常,但有些用户报告以下行为:

  1. A 向 B 发送请求。这有效。 A 现在开始读取来自 B 的回复。B
  2. 向 A 发送回复。相应的 write() 调用返回 EPIPE 错误,结果 B close() 套接字。然而,A 并没有 close() 套接字,也没有崩溃。
  3. A 的 read() 调用返回 0,表示文件结束。 A认为B提前关闭了连接。

用户还报告了此行为的变体,例如:

  1. A 向 B 发送请求。这部分有效,但在发送整个请求之前,A 的 write() 调用返回 EPIPE,结果 A close() 套接字。但是 B 没有 close() 套接字,也没有崩溃。
  2. B 读取部分请求,然后突然收到 EOF。

问题是我根本无法在本地重现这种行为。我尝试过 OS X 和 Linux。用户使用各种系统,主要是 OS X 和 Linux。

我已经尝试和考虑过的事情:

  • 双重 close() 错误(在同一文件描述符上调用 close() 两次):可能不会导致 EBADF 错误,但我还没有看到它们。
  • 增加最大文件描述符限制。一名用户报告说这对他有用,其余的则报告说这对他不起作用。

还有什么可能导致这样的行为?我确信 A 和 B 都没有过早 close() 套接字,并且我确信它们都没有崩溃,因为 A 和 B 都能够报告错误。就好像内核由于某种原因突然决定从插座上拔掉插头一样。

I have an application that consists of two processes (let's call them A and B), connected to each other through Unix domain sockets. Most of the time it works fine, but some users report the following behavior:

  1. A sends a request to B. This works. A now starts reading the reply from B.
  2. B sends a reply to A. The corresponding write() call returns an EPIPE error, and as a result B close() the socket. However, A did not close() the socket, nor did it crash.
  3. A's read() call returns 0, indicating end-of-file. A thinks that B prematurely closed the connection.

Users have also reported variations of this behavior, e.g.:

  1. A sends a request to B. This works partially, but before the entire request is sent A's write() call returns EPIPE, and as a result A close() the socket. However B did not close() the socket, nor did it crash.
  2. B reads a partial request and then suddenly gets an EOF.

The problem is I cannot reproduce this behavior locally at all. I've tried OS X and Linux. The users are on a variety of systems, mostly OS X and Linux.

Things that I've already tried and considered:

  • Double close() bugs (close() is called twice on the same file descriptor): probably not as that would result in EBADF errors, but I haven't seen them.
  • Increasing the maximum file descriptor limit. One user reported that this worked for him, the rest reported that it did not.

What else can possibly cause behavior like this? I know for certain that neither A nor B close() the socket prematurely, and I know for certain that neither of them have crashed because both A and B were able to report the error. It is as if the kernel suddenly decided to pull the plug from the socket for some reason.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

唱一曲作罢 2024-08-27 17:44:48

也许您可以尝试 strace,如下所述: http://modperlbook .org/html/6-9-1-Detecting-Aborted-Connections.html

我认为您的问题与此处描述的问题有关:http://blog.netherlabs.nl/articles/2009 /01/18/the-ultimate-so_linger-page-or-why-is-my-tcp-not-reliable

不幸的是我有一个类似的问题 我自己但无法按照给定的建议解决它。然而,也许 SO_LINGER 的东西适合你。

Perhaps you could try strace as described in: http://modperlbook.org/html/6-9-1-Detecting-Aborted-Connections.html

I assume that your problem is related to the one described here: http://blog.netherlabs.nl/articles/2009/01/18/the-ultimate-so_linger-page-or-why-is-my-tcp-not-reliable

Unfortunately I'm having a similar problem myself but couldn't manage to get it fixed with the given advices. However, perhaps that SO_LINGER thing works for you.

奢欲 2024-08-27 17:44:48
  • shutdown()
    可能已被呼叫其中之一
    套接字端点。

  • 如果任何一方都可以分叉并执行
    子进程,确保
    FD_CLOEXEC
    (close-on-exec) 标志设置在
    套接字文件描述符(如果没有)
    打算将其继承
    孩子。否则子进程
    可能(意外或其他原因)是
    操纵你的套接字连接。

  • shutdown()
    may have been called on one of the
    socket endpoints.

  • If either side may fork and execute a
    child process, ensure that the
    FD_CLOEXEC
    (close-on-exec) flag is set on the
    socket file descriptor if you did not
    intend for it to be inherited by the
    child. Otherwise the child process
    could (accidentally or otherwise) be
    manipulating your socket connection.

远山浅 2024-08-27 17:44:48

我还会检查中间是否有偷偷摸摸的防火墙。路由上的中间转发节点可能会发送 RST。追踪该问题的最佳方法当然是数据包嗅探器(或其GUI 表弟。)

I would also check that there's no sneaky firewall in the middle. It's possible an intermediate forwarding node on the route sends an RST. The best way to track that down is of course the packet sniffer (or its GUI cousin.)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文