Winsock tcp/ip 套接字侦听但连接被拒绝,竞争条件?

发布于 2024-08-30 07:38:57 字数 810 浏览 8 评论 0原文

这涉及两个自动化单元测试,每个测试都会启动一个 tcp/ip 服务器,该服务器创建一个非阻塞套接字,然后在 select() 上的循环中为连接并下载一些数据的客户端进行绑定()和监听()。

问题是它们在单独运行时工作得很好,但是当作为测试套件运行时,第二个测试客户端将无法与 WSACONNREFUSED 连接...

除非

它们之间有几秒钟的 Thread.Sleep() ???!!!

有趣的是,在任何失败后,每 1 秒都会有一个重试循环来进行连接。所以第二个测试会循环一段时间,直到10分钟后超时。

在此期间,netstat -na 显示服务器套接字处于侦听状态的正确端口号。那么如果处于listen状态呢?为什么它不接受连接?

在代码中,有一些日志消息显示 select 甚至从未让套接字准备好读取(这意味着当它应用于侦听套接字时准备接受连接)。

显然,问题必须与完成一个测试(这意味着套接字两端的 close() 和 shutdown() )与下一个测试的启动之间的某种竞争条件有关。

如果重试逻辑允许它在几秒钟后最终连接,这还不错。然而,它似乎变得“粘起来”,甚至不会重试。

然而,由于某种奇怪的原因,即使不断拒绝连接,侦听套接字也会说它处于侦听状态。

因此,这意味着 Windoze O/S 实际上捕获了 SYN 数据包并返回 RST 数据包(这意味着“连接被拒绝”)。

我唯一一次看到此错误是当代码出现问题导致数百个套接字陷入 TIME_WAIT 状态时。但这里的情况并非如此。 netstat 仅显示大约十几个套接字,其中在任何给定时刻只有 1 或 2 个处于 TIME_WAIT 状态。

请帮忙。

This involves two automated unit tests which each start up a tcp/ip server that creates a non-blocking socket then bind()s and listen()s in a loop on select() for a client that connects and downloads some data.

The catch is that they work perfectly when run separately but when run as a test suite, the second test client will fail to connect with WSACONNREFUSED...

UNLESS

there is a Thread.Sleep() of several seconds between them??!!!

Interestingly, there is retry loop every 1 second for connecting after any failure. So the second test loops for a while until timeout after 10 minutes.

During that time, netstat -na shows the correct port number is in the LISTEN state for the server socket. So if it is in the listen state? Why won't it accept the connection?

In the code, there are log messages that show the select NEVER even gets a socket ready to read (which means ready to accept a connection when it applies to a listening socket).

Obviously the problem must be related to some race condition between finishing one test which means close() and shutdown() on each end of the socket, and the start up of the next.

This wouldn't be so bad if the retry logic allowed it to connect eventually after a couple of seconds. However it seems to get "gummed up" and won't even retry.

However, for some strange reason the listening socket SAYS it's in the LISTEN state even through keeps refusing connections.

So that means it's the Windoze O/S which is actually catching the SYN packet and returning a RST packet (which means "Connection Refused").

The only other time I ever saw this error was when the code had a problem that caused hundreds of sockets to get stuck in TIME_WAIT state. But that's not the case here. netstat shows only about a dozen sockets with only 1 or 2 in TIME_WAIT at any given moment.

Please help.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

心舞飞扬 2024-09-06 07:38:57

根本问题是在关闭套接字时,线程试图读取任何剩余的字节。这是作为一个单独的线程完成的,该线程将套接字的读取端保持打开状态,持续一段固定的毫秒时间,同时尝试重复读取任何数据。

该逻辑已被替换为更智能地读取任何数据并在读取返回 0 时正确关闭。因此它关闭得更快。

所以事实证明是我自己的代码中套接字关闭不当。

感谢您的帮助!

The fundamental problem was then in closing the socket, a thread was trying to read any remaining bytes. That was done as a separate thread which holds the read end of the socket open for a fixed time of milliseconds while trying repeatedly to read any data.

That logic has been replaced to more intelligently read any data and close properly when the read returns 0. So it closed much more rapidly.

So it turned out to be improper closing of the socket in my own code.

Thanks for all the help!

素染倾城色 2024-09-06 07:38:57

我在具有不同内核数量的各种 Windows 操作系统(XP 到 Windows 7)的构建机器上运行了大量类似的测试,但我从未发现这是一个问题。

我不认为侦听套接字转换为 TIME_WAIT 可能是您的问题;我当然从未见过它,并且我定期使用在 TIME_WAIT 延迟期内启动和停止服务器的同一端口运行客户端服务器测试。

如果您在第一个服务器关闭其套接字之前启动第二个服务器(或者,如果套接字处于TIME_WAIT),那么当您尝试时,我希望您的第二个服务器会收到错误绑定()。)。

就我个人而言,我认为您接受连接的代码中更有可能存在问题 - 也就是说您的测试可能发现了错误;)

我们可以看一下您的侦听和接受循环之间的代码吗?

如果颠倒测试顺序,会出现问题吗?

客户端和服务器是否在同一台计算机上运行,​​如果不是,是否会发生变化?

等等

我有一些TCP测试工具 http:// www.lenholgate.com/blog/2005/11/windows-tcpip-server-performance.html,如果您将测试系统设置为针对此示例服务器从该链接运行测试客户端http://www.lenholgate.com/blog/2005/11/ simple-echo-servers.html 您仍然看到您的问题吗? (也就是说,在你的测试系统中运行我的服务器和我的客户端,这样它就可以像运行你的东西一样运行,我的东西能工作吗?)。

I run lots of tests like this across build machines with various Windows operating systems (XP through Windows 7) with various numbers of cores and I've never seen it be a problem.

I don't believe that the listen socket transitioning to TIME_WAIT is likely to be your problem; I've certainly never seen it and I regularly run client server tests with the same port where I start and stop servers within the TIME_WAIT delay period.

If you were starting your second server before your first had closed its socket (or, if the socket were in TIME_WAIT) then I'd expect your second server to get an error when you attempted to bind().).

Personally I think it's more likely that there's an issue in the code that you have that's accepting connections - that is your test might have found a bug ;)

Can we have a look at the code between your listen and the accept loop?

Do you have the problem if you reverse the order of the tests?

Are the client and server running on the same machine, does it change things if they aren't?

Etc.

I have some TCP test tools http://www.lenholgate.com/blog/2005/11/windows-tcpip-server-performance.html, if you set up your test system to run the test client from that link against an example server from this one http://www.lenholgate.com/blog/2005/11/simple-echo-servers.html do you still see your problem? (That is, run my server with my client in your test system so that it runs it the same as it runs your stuff and does my stuff work?).

不即不离 2024-09-06 07:38:57

来自此 MSDN 站点

TIME_WAIT 状态确定 TCP 释放已关闭的连接并重用其资源之前必须经过的时间。关闭和释放之间的这个间隔称为 TIME_WAIT 状态或 2MSL 状态。在此期间,重新打开连接对客户端和服务器来说比建立新连接的成本要低得多。 RFC 793 中指定了 TIME_WAIT 行为,该行为要求 TCP 维持关闭连接的时间间隔至少等于网络最大段生存期 (MSL) 的两倍。当一个连接被释放时,它的套接字对和该套接字使用的内部资源可以用来支持另一个连接。

Windows TCP 在连接关闭后恢复到 TIME_WAIT 状态。当处于 TIME_WAIT 状态时,套接字对无法重复使用。可以通过修改以下代表 TIME_WAIT 周期(以秒为单位)的 DWORD 注册表设置来配置 TIME_WAIT 周期。

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\TCPIP\Parameters\TcpTimedWaitDelay

默认情况下,MSL 定义为 120 秒。 TcpTimedWaitDelay 注册表设置的默认值是 240 秒,它表示最大段生命周期(120 秒或 4 分钟)的 2 倍。但是,您可以使用此条目自定义间隔。减小该条目的值可以使 TCP 更快地释放已关闭的连接,从而为新连接提供更多资源。但是,如果该值太低,TCP 可能会在连接完成之前释放连接资源,从而要求服务器使用额外的资源来重新建立连接。此注册表设置可以设置为 0 到 300 秒。

我认为您可以将该值设置为 30 的最小值(尝试更小,但可能不起作用)

您可以查看 Winsock 程序员常见问题解答以获得更详细的解释。

From This MSDN site:

The TIME_WAIT state determines the time that must elapse before TCP can release a closed connection and reuse its resources. This interval between closure and release is known as the TIME_WAIT state or 2MSL state. During this time, the connection can be reopened at much less cost to the client and server than establishing a new connection. The TIME_WAIT behavior is specified in RFC 793 which requires that TCP maintains a closed connection for an interval at least equal to twice the maximum segment lifetime (MSL) of the network. When a connection is released, its socket pair and internal resources used for the socket can be used to support another connection.

Windows TCP reverts to a TIME_WAIT state subsequent to the closing of a connection. While in the TIME_WAIT state, a socket pair cannot be re-used. The TIME_WAIT period is configurable by modifying the following DWORD registry setting that represents the TIME_WAIT period in seconds.

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\TCPIP\Parameters\TcpTimedWaitDelay

By default, the MSL is defined to be 120 seconds. The TcpTimedWaitDelay registry setting defaults to a value 240 seconds, which represents 2 times the maximum segment lifetime of 120 seconds or 4 minutes. However, you can use this entry to customize the interval. Reducing the value of this entry allows TCP to release closed connections faster, providing more resources for new connections. However, if the value is too low, TCP might release connection resources before the connection is complete, requiring the server to use additional resources to re-establish the connection. This registry setting can be set from 0 to 300 seconds.

I think the minimum you can set the value to is 30 (try smaller but it might not work)

You can look at Winsock Programmer's FAQ for a more detailed explanation.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文