104、“连接被对等方重置” 套接字错误，或者关闭套接字何时会导致 RST 而不是 FIN？

发布于 2024-07-10 19:44:20 字数 1368 浏览 12 评论 0原文

我们正在并行开发 Python Web 服务和客户端网站。在 socket.py 中持续引发一个 socket.error，读取：

(104, 'Connection reset by peer')

，一次调用会

当我们从客户端向服务发出 HTTP 请求时根据 OAuth 标头的大小，请求被分成两个数据包。服务以 ACK 响应两者。
服务发送响应，每个标头一个数据包（HTTP/1.0 200 OK，然后是日期标头等）。客户端用 ACK 响应每个。
（好请求）服务器发送FIN、ACK。客户端用 FIN、ACK 进行响应。服务器响应ACK。
（错误的请求）服务器发送 RST、ACK，客户端不发送 TCP 响应，客户端引发 socket.error。

Web 服务和客户端都运行在运行 glibc-2.6.1 的 Gentoo Linux x86-64 机器上。我们在同一个 virtual_env 中使用 Python 2.5.2。

客户端是一个 Django 1.0.2 应用程序，正在调用 httplib2 0.4.0 来发出请求。我们使用 OAuth 签名算法对请求进行签名，OAuth 令牌始终设置为空字符串。

该服务正在运行 Werkzeug 0.3.1，它使用 Python 的 wsgiref.simple_server。我通过 wsgiref.validator 运行 WSGI 应用程序，没有出现任何问题。

看起来这应该很容易调试，但是当我在服务端跟踪一个好的请求时，它看起来就像在 socket._socketobject.close() 函数中的坏请求一样，将委托方法变成了虚拟方法。当send或sendto（不记得是哪个）方法关闭时，FIN或RST被发送，客户端开始处理。

“由对等方重置连接”似乎将责任归咎于该服务，但我也不信任 httplib2。客户有错吗？

** 进一步调试 - 看起来像 Linux 上的服务器 **

我有一台 MacBook，因此我尝试在其中一台上运行服务，在另一台上运行客户端网站。 Linux 客户端调用 OS X 服务器时没有出现错误 (FIN ACK)。 OS X 客户端使用该错误调用 Linux 服务（RST ACK 和（54，“连接由对等方重置”））。所以，看起来它是在 Linux 上运行的服务。是 x86_64 吗？一个糟糕的 glibc？ wsgiref？仍在寻找...

** 进一步测试 - wsgiref 看起来不稳定 **

我们已经使用 Apache 和 mod_wsgi 进行生产，并且连接重置已经消失。请参阅下面的答案，但我的建议是记录连接重置并重试。这将使您的服务器在开发模式下运行正常，并在生产模式下稳定运行。

原文

We're developing a Python web service and a client web site in parallel. When we make an HTTP request from the client to the service, one call consistently raises a socket.error in socket.py, in read:

(104, 'Connection reset by peer')

When I listen in with wireshark, the "good" and "bad" responses look very similar:

Because of the size of the OAuth header, the request is split into two packets. The service responds to both with ACK
The service sends the response, one packet per header (HTTP/1.0 200 OK, then the Date header, etc.). The client responds to each with ACK.
(Good request) the server sends a FIN, ACK. The client responds with a FIN, ACK. The server responds ACK.
(Bad request) the server sends a RST, ACK, the client doesn't send a TCP response, the socket.error is raised on the client side.

Both the web service and the client are running on a Gentoo Linux x86-64 box running glibc-2.6.1. We're using Python 2.5.2 inside the same virtual_env.

The client is a Django 1.0.2 app that is calling httplib2 0.4.0 to make requests. We're signing requests with the OAuth signing algorithm, with the OAuth token always set to an empty string.

The service is running Werkzeug 0.3.1, which is using Python's wsgiref.simple_server. I ran the WSGI app through wsgiref.validator with no issues.

It seems like this should be easy to debug, but when I trace through a good request on the service side, it looks just like the bad request, in the socket._socketobject.close() function, turning delegate methods into dummy methods. When the send or sendto (can't remember which) method is switched off, the FIN or RST is sent, and the client starts processing.

"Connection reset by peer" seems to place blame on the service, but I don't trust httplib2 either. Can the client be at fault?

** Further debugging - Looks like server on Linux **

I have a MacBook, so I tried running the service on one and the client website on the other. The Linux client calls the OS X server without the bug (FIN ACK). The OS X client calls the Linux service with the bug (RST ACK, and a (54, 'Connection reset by peer')). So, it looks like it's the service running on Linux. Is it x86_64? A bad glibc? wsgiref? Still looking...

** Further testing - wsgiref looks flaky **

We've gone to production with Apache and mod_wsgi, and the connection resets have gone away. See my answer below, but my advice is to log the connection reset and retry. This will let your server run OK in development mode, and solidly in production.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

冷默言语 2024-07-17 19:44:20

我遇到过这个问题。请参阅Python“对等方重置连接”问题。

您（很可能）遇到了基于 Python 全局解释器锁的小计时问题。

您（有时）可以通过策略性放置的 time.sleep(0.01) 来纠正此问题。

“在哪里？” 你问。打败我。这个想法是在客户端请求中和周围提供更好的线程并发性。尝试将其放在发出请求之前，以便重置 GIL 并且 Python 解释器可以清除任何挂起的线程。

回复收藏 0 原文

固执像三岁 2024-07-17 19:44:20

不要将 wsgiref 用于生产。使用 Apache 和 mod_wsgi 或其他东西。

我们继续看到这些连接重置，有时是频繁地使用 wsgiref （werkzeug 测试服务器使用的后端，也可能是 Django 测试服务器等其他服务器）。我们的解决方案是记录错误，在循环中重试调用，并在十次失败后放弃。 httplib2 尝试了两次，但我们还需要几次。它们似乎也成群出现 - 添加 1 秒睡眠可能会解决问题。

通过 Apache 和 mod_wsgi 运行时，我们从未见过连接重置。我不知道他们做了什么不同的事情（也许他们只是掩盖了他们），但他们没有出现。

当我们向本地开发社区寻求帮助时，有人证实他们看到很多 wsgiref 连接重置在生产服务器上消失了。那里有一个错误，但很难找到它。

回复收藏 0 原文