生成 UDP 数据包的最快方法
我们正在构建一个测试工具,用于在 UDP 多播上推送二进制消息。
该原型使用 Twisted 反应器循环来推送消息,这正好达到了我们所需的流量水平 - 每秒大约 120000 条消息。
我们的测试机器上有 16 个核心,显然我想将其分布在这些核心上,以真正使线束飞起来。
有谁知道我们如何构建应用程序(使用事件循环方法或 CSP 样式方法)来提高此输出。
另外,原型中的大部分时间都花在了写入 UDP 上 - 作为 IO,我不应该感到惊讶,但我是否遗漏了什么?
欢迎任何想法。
We're building a test harness to push binary messages out on a UDP multicast.
The prototype is using the Twisted reactor loop to push out messages, which is achieving just about the level of traffic we require - about 120000 messages per second.
We have a 16 cores on our test machine, and obviously I'd like to spread this over those cores to really make the harness fly.
Does anyone have any ideas about how we might architect the application (either using an event loop approach or a CSP style approach) to up this output.
Also most of the time in the prototype is spent writing to UDP - as IO I shouldn't be surprised, but am I missing anything?
Any ideas welcome.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
多个网卡、硬件或内核接口是限制。我使用 Broadcom Corporation NetXtreme BCM5704S 千兆位以太网适配器只能达到每秒 69,000 个数据包。尝试使用四路英特尔千兆位服务器适配器,所有四个 NIC 都位于同一子网上。
Multiple NICs, the hardware or the kernel interface is the limit. I can only reach 69,000 packets per second with a Broadcom Corporation NetXtreme BCM5704S Gigabit Ethernet adapter. Try a quad Intel Gigabit Server Adapter with all four NICs on the same subnet.
当出现在 Python 应用程序中利用多个内核的问题时,显而易见的答案是使用多个进程。借助 Twisted,您可以使用reactor.spawnProcess 来启动子进程。您还可以通过其他方式(例如 shell 脚本)启动应用程序的 16 个实例。当然,这要求您的应用程序可以在同时运行多个实例的情况下合理运行。具体如何划分工作以便每个流程可以承担部分工作取决于工作的性质。
不过,我预计在所有 16 个内核全速运行之前,单个 GigE 链路就会饱和。确保您专注于系统中的瓶颈。正如 Steve-o 所说,您可能还希望计算机中有多个 NIC。
The obvious answer when the question of exploiting multiple cores in a Python application comes up is to use multiple processes. With Twisted, you can use
reactor.spawnProcess
to launch a child process. You could also just start 16 instances of your application some other way (like a shell script). This requires that your application can operate sensibly with multiple instances running at once, of course. Exactly how you might divide the work so that each process can take on some of it depends on the nature of the work.I would expect a single GigE link to be saturated long before you have all 16 cores running full tilt though. Make sure you're focusing on the bottleneck in the system. As Steve-o said, you may want multiple NICs in the machine as well.