使用尽可能多的CPU核心编写超级UDP服务器
我发现我正在编写的代码存在严重局限性。
我想做的是让我的代码在具有 24 个硬件线程的 smp xeon 机器上尽可能高效地运行。
对于这样的任务,我使用 commoncpp
围绕本机 posix 线程和套接字的包装器以及 libev 库来检测套接字文件描述符上的读取事件。 我想要实现的目标是在 UDP 套接字连接上不丢失数据,每个连接应传输约 600 兆比特/秒的数据。
我发现通过建立两个以上的连接,我会丢失数据。
我还发现 5 个线程(每个连接一个)在 CPU 核心上的平衡/分布不佳……我想说的是,只有两个核心正在工作,而其余 22 个核心则未被使用。
当然(我无法隐藏它)我是一个虚拟的 smp 开发人员,在尝试建立“硬件线程”时确实需要一些帮助。
我会很高兴了解是否有某种 posix 功能/特性来强制硬件线程或一些如何指南(对于像我这样的傻瓜:))解释如何使用 cpu 核心来满足专用需求。
正如您可能已经理解的那样,我希望每个连接都有一个专用的 cpu 核心。
谢谢大家!
I found serious limitations with the code i am writing.
What I am trying to do is to let my code work on a smp xeon machine with 24 hardware threads as most efficient as it could.
For such a task I am using commoncpp
wrappers around native posix threads and sockets plus the libev library to detect read events on socket file descriptors.
The goal I want to obtain is to have no data loss on UDP socket connections which should take around 600mbit/sec of data each.
I found that by establishing more than two connections I got data being lost.
I discovered also that the five threads (one per connection) are not well balanced/distributed on the cpu cores...with this I want to say that only two cores are being working while the rest 22 are left apart unused.
For sure (I can not hide it) I am a dummy smp developer which really needs some help in trying to establish "hardware threads".
I will be so glad to understand whether there is some kind of a posix capability/feature to force hardware threads or some howto guide (for dummies like me :) ) which explain how to use the cpu cores for dedicated needs.
As you may have understood I would like to have one dedicated cpu core per connection.
Thank you all!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我可以推荐易于实现的方法,该方法应该提供相当好的性能。将 Boost.Asio 与 Boost.Thread。 Boost.Asio 提供异步网络,并且可以在多线程环境中使用,只需很少的额外工作(驯服多线程的好例子)。研究这些链接:
Asio第一次能吓到别人。但后来你就会沉迷其中。
有一次我听说 Asio 内部调度程序性能不是最佳的。我无法对此发表评论。到目前为止,在许多性能要求较高的项目中使用它后,我对它的性能感到满意。
I can recommend easy to implement approach that should provide quite good performance. Use Boost.Asio with Boost.Thread. Boost.Asio provides asynchronous networking and can be used in multithreaded environment with little additional effort (good example of tamed multithreading). Investigate these links:
The first time Asio can scare somebody. But then you become addicted to it.
Once I heard that Asio internal dispatcher performance is not optimal. I cannot comment this. Up to now, after using it in many projects with tough performance requirements I was satisfied by its performance.
要进行这种高速网络,您可能需要深入了解硬件和操作系统设置。
检查网卡是否支持多个输入队列以及是否可以使用 MSI 代替常规中断。看看是否可以为每个 CPU 核心设置一个输入队列。查看是否有一些选项可用于将传入数据包拆分到每个队列。
检查操作系统输入缓冲区大小。您可能需要将它们设置得更大以避免丢失 UDP。
To do this sort of high speed networking, you might need to dig into the hardware and OS settings.
Check if the network card has support for multiple input queues and if it can use MSI instead of regular interrupts. See if you can set one input queue per CPU core. See if there are some options for how to split up incoming packets to each queue.
Check the OS input buffer sizes. You may need to make them a lot bigger to avoid dropping UDP.