我正在开发一个工具来在 UDP 服务器上执行负载测试(使用在 NT 上运行的 C#/.NET 4.0 6.x,尽管不太相关)。服务器与数以万计的客户端通信,每个客户端之间的通信流量非常低且不频繁。所有通信都遵循请求-答复模式,其中一侧发起与另一侧的通信,然后另一侧进行答复。当服务器需要向客户端发送某些内容时,他会查找客户端的最后一个已知端点(IP + 端口)并发送一个 UDP 数据包,并在单个已知端口上侦听答复,该端口用于接收来自所有客户端的通信。客户。当客户端发起通信时,它已经知道服务器的端点,并且只需从临时端口发送数据包并等待同一端口上的答复。在客户端的生命周期中使用相同的临时端口。
负载测试工具的设计非常简单;以较低但足够的复杂度模拟每个客户的行为、状态和决策。由于与每个客户端的通信只是偶尔(每隔几秒),并且每次通信所需的处理量非常小,因此我能想到的最好方法是使用单个线程和单个套接字来执行所有通信对于大量模拟客户端,这很可能仍然无法使线程完全繁忙且套接字饱和。不幸的是,我在这种方法中遇到了两个问题,因为每个客户端都从自己的端口发送和接收:
- 套接字只允许从系统分配的临时端口或套接字绑定的特定端口发送 UDP 数据包到。
- 套接字只会从其绑定的端口接收 UDP 数据包。
每个客户端一个套接字
上述两个约束似乎意味着我必须为每个客户端创建一个套接字,因为 UDP 数据包必须源自某个端口,并且回复将发送到该端口。因此,第一个可能的解决方案就是这样做,为每个模拟客户端创建一个套接字。假设我们在一台机器上模拟 30,000 个客户端:
- 创建 30,000 个套接字是否可行?这是最佳实践吗?性能好吗? Windows 甚至允许您将 30,000 个套接字绑定到 30,000 个不同的端口吗?
- 如何检查 30,000 个客户端套接字是否有服务器发送的数据?我是否定期轮询所有套接字以查看是否收到任何数据?有没有办法等待所有 30,000 个套接字并获取到达其中任何一个的第一个数据包?
- 操作系统为每个套接字分配哪些资源?这些资源的局限性是什么?达到这些资源的影响是什么?
所有客户端使用单个套接字
另一种方法是使用单个套接字,但我之前提到的两个问题必须首先以某种方式解决:
- 第一个问题是 UDP 数据包源自 Socket 端口以外的端口,这是可以解决的。它涉及创建一个原始套接字并自己构建 UDP 标头,这意味着您可以指定任何您想要的源端口和目标端口。唯一的困难是计算可选但重要的 UDP 校验和,这不仅需要 UDP标头和负载,还有源和目标 IP 地址,前者是有问题的,因为它需要调用 Win32 API 来获取 (GetBestInterface 和 GetAdaptersInfo),其中涉及多个本机结构和大量非托管内存分配,从 .NET 角度来看,这是一个潜在的可靠性陷阱,但这是可以做到的。
- 第二个问题,使用单个套接字从端口列表(或范围)接收 UDP 数据包,我仍然没有解决。即使使用原始套接字,操作系统也要求我将套接字绑定到特定的单个端口,然后才允许我执行接收操作。有办法做到这一点吗?总是有数据包嗅探技术,但我宁愿避免它们,除非它可以通过托管代码以可靠且简单的方式完成(在这种情况下,我愿意接受建议)。
其他方法?
还有另一种我没有想到的方法吗?我是不是走错了路?你能想出更好的解决方案吗?我很想听听您的任何建议。
I am developing a tool to perform load testing on a UDP server (using C#/.NET 4.0 running on NT 6.x, although that is less relevant). The server talks to tens of thousands of clients, where the communication between each client is very low traffic and infrequent. All communications follow a request-reply pattern, where one of the sides initiates communication with the other side who then replies. When the server needs to send something to the client, he looks up the last known endpoint (IP + port) of the client and sends a UDP packet, and listens for a reply on a single known port which is used to receive communications from all clients. When the client initiates communication, it already knows the endpoint of the server, and simply sends a packet from an ephemeral port and waits for a reply on the same port. The same ephemeral port is used for the lifetime of the client.
The design of the load testing tool is pretty simple; emulate the behavior, state and decision making of each client, to a low yet sufficient complexity. Since the communication with each client is only occasional (every few seconds), and the amount of processing required for each communication is very minimal, the best approach I can think of is to use a single thread with a single socket to perform all the communication for a large number of simulated clients, which most likely still won't keep the thread fully busy and socket saturated. Unfortunately, I have encountered two problems with that approach arising from the fact that each client sends and receives from his own port:
- A socket will only allow sending a UDP packet from either a system-allocated ephemeral port or a specific port the socket is bound to.
- A socket will only receive UDP packets from the port it is bound to.
One socket per client
This above two constraints seems to mean that I must create a socket for each client, since the UDP packet must originate from a certain port and a reply will be sent to that port. So the first possible solution is to do just that, create a socket per simulated client. Let's say we're simulating 30,000 clients on a single machine:
- Is creating 30,000 sockets even feasible? Is it a best practice? Is it performant? Will Windows even let you bind 30,000 sockets to 30,000 different ports?
- How do I check across 30,000 client sockets if any data was sent by the server? Do I poll all the sockets periodically to see if any data was received? Is there a way to wait on all 30,000 sockets and get the first packet that arrives to any of them?
- What resources does the operating system allocated for each socket? What are the limits of each of those resources and what are the implications of reaching them?
A single socket for all clients
A different approach would be to use a single socket, but the two problems I mentioned earlier must first be solved somehow:
- The first problem of having the UDP packet originate at a port other than the Socket's port is solvable. It involves creating a raw socket and constructing the UDP header yourself, which means you can specify any source and destination port you'd like. The only difficulty is with calculating the optional yet important UDP checksum, which requires not only the UDP header and payload, but also the source and destination IP address, the former is problematic since it requires calling Win32 APIs to obtain (GetBestInterface and GetAdaptersInfo) which involves several native structures and lots of unmanaged memory allocations, a potential reliability pitfall from a .NET perspective, but it can be done.
- The second problem, using a single socket to receive UDP packets from a list (or range) of ports remains unsolved by me. Even with raw socket, the OS demands I bind the socket to a specific single port before allowing me to perform receive operations. Is there a way to do that? There's always the packet sniffing techniques, but I'd rather avoid them unless it can be done from managed code in a reliable and a somewhat straightforward way (in which case, I'm open for suggestions).
Other approaches?
Is there another approach I haven't thought of? Am I on the wrong track? Can you think of a better solution? I'd love to hear any suggestions you might have.
发布评论
评论(1)
是的,您可以轻松创建> Windows 计算机上有 30,000 个套接字,但您可能需要调整
MAXUSERPORT
(请参阅 此处)。使用 I/O 完成端口或异步 I/O,您就无需担心“轮询”,并且可扩展性“正常工作”。
主要的资源问题是非分页池,但在 Vista 或更高版本上这已不再是一个问题(请参阅 此处),下一个问题是 I/O 锁定页面限制,如果您要为读取发布非常大的缓冲区,这可能是一个问题,但是鉴于这是 UDP,我假设您将拥有“合理”大小的数据报,并且锁定页面限制不太可能成为问题。
我在这里写了一些关于可扩展性问题的博客: http ://www.serverframework.com/asynchronousevents/2010/12/one-million-tcp-connections.html
我已经编写了类似于您尝试使用我的 C++ 套接字服务器框架。有一个 UDP 测试工具示例,作为框架的一部分提供,该示例从唯一的客户端端口发送可配置数量的数据报并等待回复,并且可以轻松调整以处理您的特定数据报格式和响应要求(请参阅此处)。
Yes you can easily create > 30,000 sockets on a windows machine, but you may need to tune
MAXUSERPORT
(see here).Use I/O completion ports, or async I/O and you don't then need to worry about 'polling' and the scalability 'just works'.
The main resource issue is non-paged pool but this has become far less of an issue on Vista or later (see here), the next issue is the I/O locked pages limit which may be an issue if you are posting very large buffers for your reads, but given this is UDP I assume you'll have 'sensible' sized datagrams and the locked pages limit is then unlikely to be an issue.
I've blogged about some of the scalability issues here: http://www.serverframework.com/asynchronousevents/2010/12/one-million-tcp-connections.html
I've written tools similar to what you're trying to do using my C++ socket server framework. There's a UDP test tool example that ships as part of the framework that sends a configurable number of datagrams from unique client ports and waits for replies and which is easy to adjust to deal with your specific datagram format and response requirements (see here).