快速创建数千个线程并几乎同时执行它们

发布于 2024-09-27 04:23:25 字数 352 浏览 8 评论 0原文

我有一个 C#.NET 应用程序,需要通知 4000 到 40,000 个连接的设备同时执行一项任务(或尽可能接近同时执行)。

该应用程序运行良好;然而,我对表现并不满意。在完美的世界中,一旦我发送命令,我希望看到所有设备同时响应。然而,当我创建的所有线程启动并执行任务时,似乎存在延迟。

我使用了 .NET 4.0 ThreadPool,使用自定义线程创建了自己的解决方案,甚至调整了现有的 ThreadPool 以允许同时执行更多线程。

我仍然想要更好的表现,这就是我来这里的原因。有什么想法吗?评论?建议?谢谢。

-Shaun

让我补充一下,应用程序通知这些“连接的设备”它们需要监听多播地址上的音频。

I have a C#.NET application that needs to inform anywhere from 4000 to 40,000 connected devices to perform a task all at once (or as close to simultaneous as possible).

The application works well; however, I am not satisfied with the performance. In a perfect world, as soon as I send the command I would like to see all of the devices respond simultaneously. Yet, there seems to be a delay as all the threads I have created spin up and perform the task.

I have used the .NET 4.0 ThreadPool, created my own solution using custom threads and I have even tweaked the existing ThreadPool to allow for more threads to be executed at once.

I still want better performance and that is why I am here. Any ideas? Comments? Suggestion? Thank you.

-Shaun

Let me add that the application notifies these 'connected devices' that they need to go listen for audio on a multicast address.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

薆情海 2024-10-04 04:23:25

双核超线程处理器可以同时执行 4 个线程 - 取决于线程正在执行的操作(IO 或内存访问等上没有争用)。四核超线程可能有 8 个。但 40K 是不可能实现的。

如果您想要接近同时进行,那么最好旋转与计算机具有空闲核心一样多的线程,并让每个线程触发通知然后结束。这样你就可以摆脱一堆上下文切换。

或者,看看其他地方。正如 SB 在评论中建议的那样,使用 UDP 多播来通知侦听机器它们应该执行某些操作。

A dual-core hyperthreaded processor MAY be able to execute 4 threads simultaneously - depending on what the thread is doing (no contention on IO or memory access, etc). A quad-core hyperthread perhaps 8. But 40K just can't physically happen.

If you want near simultaneous, you're better off spinning up just as many threads as the computer has free cores and having each thread fire off notifications then end. You'll get rid of a bunch of context switching this way.

Or, look elsewhere. As SB recommended in the comments, use a UDP multicast to notify listening machines that they should do something.

土豪我们做朋友吧 2024-10-04 04:23:25

无法同时执行 4000 个线程,更不用说 40k 了。在具有超线程的台式机上,最多可以同时运行最多 8 个进程(假设是四核)。线程是伪并行的,这甚至没有深入探讨总线争用问题。

如果您绝对需要 40k 设备的同步性,那么您需要某种形式的硬件同步。

You cannot execute 4000 threads simultaneously, let alone 40k. At best on a desktop box with hyperthreading, you might get up to 8 simultaneous processes going (this assumes quad core). Threads are pseudo-parallel, and that's not even digging into the issues of bus contention.

If you absolutely need simultaneity for 40k devices, you want some form of hardware synchronization.

尤怨 2024-10-04 04:23:25

听起来您对每个设备上运行的软件有一定的控制权。在这种情况下,您可以查看 HPC 的使用情况并分层构建设备(节点)和/或使用 MPI 来执行远程进程。

对于层次结构示例:指定 8 个节点作为主要主节点,同样具有 8 个从节点,每个从节点也可以充当具有 8 个从节点的主节点(您可能需要查看自动订阅算法来执行此操作)。您将拥有一个 6 深的层次结构,覆盖 40,000 个节点。每个主设备都有一小部分代码不断运行,等待指令传递给从设备。

然后您要做的就是将指令传递给 8 个主要主控器,您的指令将由主控器异步传播到线路上的“集群”。该指令最多只需传递 5 次,因此会快速传播。

或者(或结合)您可以查看 MPI,这是一个现成的解决方案。有一些已建立的 C# 实现。

It sounds like you have some control over what software runs on each device. In which case, you could look to HPC usage and architect your devices (nodes) hierarchically and/or use MPI to execute your remote processes.

For the hierarchy example: Designate say, 8 nodes as primary masters, again with 8 slave nodes, each slave can act as a master too with 8 slaves (you might need to look at an automated subscription algorithm to do this). You will have a hierarchy 6 deep to cover 40,000 nodes. Each master has a small portion of code running continually waiting for instructions to pass to slaves.

All you then do is pass the instruction to the 8 primary masters and your instruction will be propagated to the ‘cluster’ on the wire asynchronously by the masters. The instruction only has to be passed on a maximum of 5 times, and thus will be propagated v-quickly.

Alternatively (or in conjunction) you could look at MPI, which is an out-of-the-can solution. There are some established C# implementations.

荒路情人 2024-10-04 04:23:25

创建数千个线程的开销非常大;我会寻求替代解决方案。这听起来像是异步 IO 的一项工作:您的计算机大概只有一个网络连接,因此一次只能发送一条消息 - 线程无法对此进行改进!

The overhead of creating thousands of threads is (very) significant; I would seek an alternative solution. This sounds like a job for asynchronous IO: your computer presumably only has one network connection, so no more than one message can be sent at a time - threads cannot improve on this!

背叛残局 2024-10-04 04:23:25

我是否正确猜测您在设备上使用同步 API 调用,这就是它必须在线程中执行的原因? API 是否有异步版本的调用?如果设备 API 真的可以支持 40k+ 设备,那么它就应该支持。它还应该对同步回调的返回数据所需的任何等待句柄(或等效项)进行内部处理。这不是您可以在客户端应用程序端处理的事情;而是您可以在客户端应用程序端处理的事情。您对设备 API 的底层实现没有足够的了解,无法了解如何并行化任务。正如您所发现的,创建具有阻塞调用的 40k 线程并不能减少它。

Am I correct in guessing that you're using a synchronous API call on your device, which is why it must be executed in a thread? Does the API have an asynchronous version of the call? If the device API can really support 40k+ devices, then it should. It should also have internal handling of whatever wait handles (or equivalent) are required to synchronize the return data for callback. This isn't something you can handle at the client application side; you don't have enough visibility of the underlying implementation of the device API to know how to parallelize the tasks. As you've discovered, creating 40k threads with blocking calls doesn't cut it.

暖心男生 2024-10-04 04:23:25

您应该对设备进行异步 IO。这是非常高效的,并且使用不同的(更大的)线程集来处理一些工作。当然,设备接收命令的速度会快得多。 IO线程池将处理回复(如果有)

You should do async IO to the devices. This is very efficient and uses a different (larger ) set of threads to handle some of the work. Certainly the devices will receive the commands much faster. The IO thread pool will handle the replies (if any)

无风消散 2024-10-04 04:23:25

和这些老家伙在一起总是很有趣。

每个线程 1mb 意味着您至少需要 4-40GB RAM 和 4k-40k 核心。事实上你有一个网络可以发送它。

意味着它将在途中的某个地方同步,在最近的交换机/路由器上(其中大部分甚至可能在您的网卡上,如果您甚至可以同时获取所有包,并且它设法在没有缓存的情况下发送它)它或死于你)。这意味着多线程的所有工作都是徒劳的,因为它不会同时到达端点。

可以将其视为一条 40,000 车道的道路,并在其上放置 40,000 辆汽车,确保每个人同时到达道路上的同一点,但随后他们离开道路回家。每个人回家的时间都不同,即使他们在同一时间点开始在 40k 公路上行驶。

你只是无法击败物理领域(但是……)。

Always fun with these old ones.

1mb per thread means you need 4-40gb just in RAM minimum, and 4k-40k cores. and the fact that you have a network to send it on.

Means that it will be syncronized somewhere along the way, on the nearest switch/router (most of it probably even on you network card, if you even could get all the packages there at the same time, and it managed to send it without caching it or dying on you). Meaning simply all that work multi threading was for nothing as it will not reach the endpoints simultaneously.

Think of it as taking one 40'000 lane road and placing 40'000 cars on it, sure everyone get to the same point on the road at the same time, but then they leave the road and go home. Everyone gets home at different times, even if they started driving on the 40k road at the same point and time.

You just, can not, beat the physical realm (yet...).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文