.NET 套接字与 C++高性能插座

发布于 2024-12-21 03:27:11 字数 566 浏览 2 评论 0原文

我的问题是与我的同事解决关于 C++ 与 C# 的争论。

我们实现了一个接收大量UDP流的服务器。该服务器是用 C++ 开发的,使用异步套接字和使用完成端口的重叠 I/O。我们使用 5 个完成端口和 5 个线程。该服务器可以轻松处理千兆网络上的 500 Mbps 吞吐量,而不会丢失任何数据包/错误(我们的测试没有超出 500 Mbps)。

我们尝试用 C# 重新实现相同类型的服务器,但未能达到相同的传入吞吐量。我们使用 ReceiveAsync 方法和 SocketAsyncEventArgs 池来使用异步接收,以避免为每个接收调用创建新对象的开销。每个 SAEventArgs 都设置有一个缓冲区,因此我们不需要为每个接收分配内存。该池非常非常大,因此我们可以对 100 多个接收请求进行排队。此服务器无法处理超过 240 Mbps 的传入吞吐量。超过该限制,我们会丢失 UDP 流中的一些数据包。

我的问题是:我应该期望使用 C++ 套接字和 C# 套接字获得相同的性能吗?我的观点是,如果 .NET 中的内存管理正确,应该具有相同的性能。

附带问题:有人知道一篇很好的文章/参考资料来解释 .NET 套接字如何在幕后使用 I/O 完成端口吗?

My question is to settle an argument with my co-workers on C++ vs C#.

We have implemented a server that receives a large amount of UDP streams. This server was developed in C++ using asynchronous sockets and overlapped I/O using completion ports. We use 5 completion ports with 5 threads. This server can easily handle a 500 Mbps throughput on a gigabit network without any lost of packets / error (we didn't push our tests farther than 500 Mbps).

We have tried to re-implement the same kind of server in C# and we have not been able to reach the same incoming throughput. We are using asynchronous receive using ReceiveAsync method and a pool of SocketAsyncEventArgs to avoid the overhead of creating new object for every receive call. Each SAEventArgs has a buffer set to it so we do not need to allocate memory for every receive. The pool is very, very large so we can queue more than 100 receive requests. This server is unable to handle an incoming throughput of more than 240 Mbps. Over that limit, we lose some packets in our UDP streams.

My question is this: should I expect the same performance using C++ sockets and C# sockets? My opinion is that it should be the same performance if memory is managed correctly in .NET.

Side question: would anybody know a good article/reference explaining how .NET sockets use I/O completion ports under the hood?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

夏有森光若流苏 2024-12-28 03:27:11

有人知道一篇很好的文章/参考资料来解释 .NET 套接字如何在幕后使用 I/O 完成端口吗?

我怀疑唯一的参考是实现(即反射器或其他程序集反编译器)。这样,您会发现所有异步 IO 都会通过 IO 完成端口,并在 IO 线程池(与普通线程池分开)中处理回调。

使用 5 个完成端口

我希望使用单个完成端口将所有 IO 处理到单个线程池中,每个池有一个线程服务完成(假设您也在异步执行任何其他 IO,包括磁盘)。

如果您有某种形式的优先级,多个完成端口将是有意义的。

我的问题是:我应该期望使用 C++ 套接字和 C# 套接字获得相同的性能吗?

是或否,取决于您定义“使用...套接字”部分的范围。就从异步操作开始到完成发布到完成端口的操作而言,我预计没有显着差异(所有处理都在 Win32 API 或 Windows 内核中)。

然而,.NET 运行时提供的安全性会增加一些开销。例如。将检查缓冲区长度、验证委托等。如果应用程序的限制是 CPU,那么这可能会产生影响,并且在极端情况下,微小的差异很容易累加。

此外.NET版本偶尔会暂停GC(.NET 4.5进行异步收集,因此将来会变得更好)。有一些技术可以最大限度地减少垃圾积累(例如,重用对象而不是创建对象,利用结构同时避免装箱)。

最后,如果 C++ 版本可以工作并且满足您的性能需求,为什么要移植?

would anybody know a good article/reference explaining how .NET sockets use I/O completion ports under the hood?

I suspect the only reference would be the implementation (ie. Reflector or other assembly de-compiler). With that you will find that all asynchronous IO goes through an IO Completion Port with call backs being processed in the IO-thread pool (which is separate to the normal thread pool).

use 5 completion ports

I would expect to use a single completion port processing all the IO into a single pool of threads with one thread per pool servicing completions (assuming you are doing any other IO, including disk, asynchronously as well).

Multiple completion ports would make sense if you have some form of prioritisation going on.

My question is this: should I expect the same performance using C++ sockets and C# sockets?

Yes or no, depending on how narrowly you define the "using ... sockets" part. In terms of the operations from the start of the asynchronous operation until the completion is posted to the completion port I would expect no significant difference (all the processing is in the Win32 API or Windows kernel).

However the safety that the .NET runtime provides will add some overhead. Eg. buffer lengths will be checked, delegates validated etc. If the limit on the application is CPU then this is likely to make a difference, and at the extreme a small difference can easily add up.

Also the .NET version will occasionally pause for GC (.NET 4.5 does asynchronous collection, so this will get better in the future). There are techniques to minimise garbage accumulating (eg. reuse objects rather than creating them, make use of structures while avoiding boxing).

In the end, if the C++ version works and is meeting your performance needs, why port?

寂寞陪衬 2024-12-28 03:27:11

您无法将代码从 C++ 直接移植到 C# 并期望获得相同的性能。在内存管理 (GC) 和确保代码安全(边界检查等)方面,.NET 比 C++ 做得更多。

我会为所有 IO 操作分配一个大缓冲区(例如 65535 x 500 = 32767500 字节),然后为每个 SocketAsyncEventArgs 分配一个块(以及发送操作)。内存比CPU便宜。使用缓冲区管理器/工厂为所有连接和 IO 操作提供块(享元模式)。微软在他们的异步示例中做到了这一点。

Begin/End 和 Async 方法都在后台使用 IO 完成端口。后者不需要为每个操作分配对象,从而提高了性能。

You can't do a straight port of the code from C++ to C# and expect the same performance. .NET does a lot more than C++ when it comes to memory management (GC) and making sure that your code is safe (boundary checks etc).

I would allocate one large buffer for all IO operations (for instance 65535 x 500 = 32767500 bytes) and then assign a chunk to each SocketAsyncEventArgs (and for send operations). Memory is cheaper than CPU. Use a buffer manager / factory to provide chunks for all connections and IO operations (Flyweight pattern). Microsoft does this in their Async example.

Both Begin/End and Async methods uses IO completion ports in the background. The latter doesn't need to allocate objects for each operation which boosts performance.

沐歌 2024-12-28 03:27:11

我的猜测是,您没有看到相同的性能,因为 .NET 和 C++ 实际上在做不同的事情。您的 C++ 代码可能不那么安全,或者检查边界。另外,您是否只是简单地测量接收数据包的能力而不进行任何处理?或者您的吞吐量是否包括数据包处理时间?如果是这样,那么您编写的用于处理数据包的代码可能效率不高。

我建议使用分析器来检查花费最多时间的地方并尝试对其进行优化。实际的套接字代码应该具有相当高的性能。

My guess is that you're not seeing the same performance because .NET and C++ are actually doing different things. Your C++ code may not be as safe, or check boundaries. Also, are you simply measuring the ability to receive the packets without any processing? Or does your throughput include packet processing time? If so, then the code you may have written to process the packets may not be as efficient.

I'd suggest using a profiler to check where the most time is being spent and trying to optimize that. The actual socket code should be quite performant.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文