104、“连接被对等方重置” 套接字错误,或者关闭套接字何时会导致 RST 而不是 FIN?

发布于 2024-07-10 19:44:20 字数 1368 浏览 5 评论 0原文

我们正在并行开发 Python Web 服务和客户端网站。 在 socket.py 中持续引发一个 socket.error,读取:

(104, 'Connection reset by peer')

,一次调用会

  • 当我们从客户端向服务发出 HTTP 请求时 根据 OAuth 标头的大小,请求被分成两个数据包。 服务以 ACK 响应两者。
  • 服务发送响应,每个标头一个数据包(HTTP/1.0 200 OK,然后是日期标头等)。 客户端用 ACK 响应每个。
  • (好请求)服务器发送FIN、ACK。 客户端用 FIN、ACK 进行响应。 服务器响应ACK。
  • (错误的请求)服务器发送 RST、ACK,客户端不发送 TCP 响应,客户端引发 socket.error。

Web 服务和客户端都运行在运行 glibc-2.6.1 的 Gentoo Linux x86-64 机器上。 我们在同一个 virtual_env 中使用 Python 2.5.2。

客户端是一个 Django 1.0.2 应用程序,正在调用 httplib2 0.4.0 来发出请求。 我们使用 OAuth 签名算法对请求进行签名,OAuth 令牌始终设置为空字符串。

该服务正在运行 Werkzeug 0.3.1,它使用 Python 的 wsgiref.simple_server。 我通过 wsgiref.validator 运行 WSGI 应用程序,没有出现任何问题。

看起来这应该很容易调试,但是当我在服务端跟踪一个好的请求时,它看起来就像在 socket._socketobject.close() 函数中的坏请求一样,将委托方法变成了虚拟方法。 当send或sendto(不记得是哪个)方法关闭时,FIN或RST被发送,客户端开始处理。

“由对等方重置连接”似乎将责任归咎于该服务,但我也不信任 httplib2。 客户有错吗?

** 进一步调试 - 看起来像 Linux 上的服务器 **

我有一台 MacBook,因此我尝试在其中一台上运行服务,在另一台上运行客户端网站。 Linux 客户端调用 OS X 服务器时没有出现错误 (FIN ACK)。 OS X 客户端使用该错误调用 Linux 服务(RST ACK 和(54,“连接由对等方重置”))。 所以,看起来它是在 Linux 上运行的服务。 是 x86_64 吗? 一个糟糕的 glibc? wsgiref? 仍在寻找...

** 进一步测试 - wsgiref 看起来不稳定 **

我们已经使用 Apache 和 mod_wsgi 进行生产,并且连接重置已经消失。 请参阅下面的答案,但我的建议是记录连接重置并重试。 这将使您的服务器在开发模式下运行正常,并在生产模式下稳定运行。

We're developing a Python web service and a client web site in parallel. When we make an HTTP request from the client to the service, one call consistently raises a socket.error in socket.py, in read:

(104, 'Connection reset by peer')

When I listen in with wireshark, the "good" and "bad" responses look very similar:

  • Because of the size of the OAuth header, the request is split into two packets. The service responds to both with ACK
  • The service sends the response, one packet per header (HTTP/1.0 200 OK, then the Date header, etc.). The client responds to each with ACK.
  • (Good request) the server sends a FIN, ACK. The client responds with a FIN, ACK. The server responds ACK.
  • (Bad request) the server sends a RST, ACK, the client doesn't send a TCP response, the socket.error is raised on the client side.

Both the web service and the client are running on a Gentoo Linux x86-64 box running glibc-2.6.1. We're using Python 2.5.2 inside the same virtual_env.

The client is a Django 1.0.2 app that is calling httplib2 0.4.0 to make requests. We're signing requests with the OAuth signing algorithm, with the OAuth token always set to an empty string.

The service is running Werkzeug 0.3.1, which is using Python's wsgiref.simple_server. I ran the WSGI app through wsgiref.validator with no issues.

It seems like this should be easy to debug, but when I trace through a good request on the service side, it looks just like the bad request, in the socket._socketobject.close() function, turning delegate methods into dummy methods. When the send or sendto (can't remember which) method is switched off, the FIN or RST is sent, and the client starts processing.

"Connection reset by peer" seems to place blame on the service, but I don't trust httplib2 either. Can the client be at fault?

** Further debugging - Looks like server on Linux **

I have a MacBook, so I tried running the service on one and the client website on the other. The Linux client calls the OS X server without the bug (FIN ACK). The OS X client calls the Linux service with the bug (RST ACK, and a (54, 'Connection reset by peer')). So, it looks like it's the service running on Linux. Is it x86_64? A bad glibc? wsgiref? Still looking...

** Further testing - wsgiref looks flaky **

We've gone to production with Apache and mod_wsgi, and the connection resets have gone away. See my answer below, but my advice is to log the connection reset and retry. This will let your server run OK in development mode, and solidly in production.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

冷默言语 2024-07-17 19:44:20

我遇到过这个问题。 请参阅Python“对等方重置连接”问题

您(很可能)遇到了基于 Python 全局解释器锁的小计时问题。

您(有时)可以通过策略性放置的 time.sleep(0.01) 来纠正此问题。

“在哪里?” 你问。 打败我。 这个想法是在客户端请求中和周围提供更好的线程并发性。 尝试将其放在发出请求之前,以便重置 GIL 并且 Python 解释器可以清除任何挂起的线程。

I've had this problem. See The Python "Connection Reset By Peer" Problem.

You have (most likely) run afoul of small timing issues based on the Python Global Interpreter Lock.

You can (sometimes) correct this with a time.sleep(0.01) placed strategically.

"Where?" you ask. Beats me. The idea is to provide some better thread concurrency in and around the client requests. Try putting it just before you make the request so that the GIL is reset and the Python interpreter can clear out any pending threads.

固执像三岁 2024-07-17 19:44:20

不要将 wsgiref 用于生产。 使用 Apache 和 mod_wsgi 或其他东西。

我们继续看到这些连接重置,有时是频繁地使用 wsgiref (werkzeug 测试服务器使用的后端,也可能是 Django 测试服务器等其他服务器)。 我们的解决方案是记录错误,在循环中重试调用,并在十次失败后放弃。 httplib2 尝试了两次,但我们还需要几次。 它们似乎也成群出现 - 添加 1 秒睡眠可能会解决问题。

通过 Apache 和 mod_wsgi 运行时,我们从未见过连接重置。 我不知道他们做了什么不同的事情(也许他们只是掩盖了他们),但他们没有出现。

当我们向本地开发社区寻求帮助时,有人证实他们看到很多 wsgiref 连接重置在生产服务器上消失了。 那里有一个错误,但很难找到它。

Don't use wsgiref for production. Use Apache and mod_wsgi, or something else.

We continue to see these connection resets, sometimes frequently, with wsgiref (the backend used by the werkzeug test server, and possibly others like the Django test server). Our solution was to log the error, retry the call in a loop, and give up after ten failures. httplib2 tries twice, but we needed a few more. They seem to come in bunches as well - adding a 1 second sleep might clear the issue.

We've never seen a connection reset when running through Apache and mod_wsgi. I don't know what they do differently, (maybe they just mask them), but they don't appear.

When we asked the local dev community for help, someone confirmed that they see a lot of connection resets with wsgiref that go away on the production server. There's a bug there, but it is going to be hard to find it.

从﹋此江山别 2024-07-17 19:44:20

通常,如果您进行了一次不会停留的关闭(即,如果数据尚未发送和确认,则堆栈可以丢弃数据),您会得到一个 RST;如果您允许关闭,您会得到一个正常的 FIN徘徊(即关闭等待传输中的数据被确认)。

也许您需要做的就是将套接字设置为延迟,以便消除套接字上完成的非延迟关闭和 ACK 到达之间的竞争条件?

Normally, you'd get an RST if you do a close which doesn't linger (i.e. in which data can be discarded by the stack if it hasn't been sent and ACK'd) and a normal FIN if you allow the close to linger (i.e. the close waits for the data in transit to be ACK'd).

Perhaps all you need to do is set your socket to linger so that you remove the race condition between a non lingering close done on the socket and the ACKs arriving?

奢华的一滴泪 2024-07-17 19:44:20

然而,我在使用 python-requests 客户端发布到 nginx+uwsgi 后端上传非常大的文件时遇到了同样的问题。

最终的原因是后端对上传的最大文件大小的上限低于客户端尝试发送的大小。

该错误从未出现在我们的 uwsgi 日志中,因为此限制实际上是由 nginx 施加的。

提高 nginx 中的限制消除了错误。

I had the same issue however with doing an upload of a very large file using a python-requests client posting to a nginx+uwsgi backend.

What ended up being the cause was the the backend had a cap on the max file size for uploads lower than what the client was trying to send.

The error never showed up in our uwsgi logs since this limit was actually one imposed by nginx.

Upping the limit in nginx removed the error.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文