如何确保 N 个线程以大致相同的速度运行?
我正在考虑编写一个物理模拟软件,其中每个物理元素都将在其自己的线程中进行模拟。
这种方法有几个优点。 从概念上讲,它非常接近现实世界的运作方式。 将系统扩展到多台机器会容易得多。
然而,为了实现这一点,我需要确保所有线程以相同的速度运行,并对“相同”进行相当自由的解释。 彼此之间的误差在 1% 之内。
这就是为什么我不一定需要 Thread.join() 之类的解决方案。 我不想要一些超级控制的女老师来确保所有线程定期相互同步。 我只需要能够要求运行时(无论它是什么——可以是 Java、Erlang 或最适合此问题的任何运行时)以大致相等的速度运行线程。
任何建议将不胜感激。
更新2009-03-16
我要感谢所有回答这个问题的人,特别是那些回答本质上是“不要这样做”的人。 感谢大家的评论,我现在更好地理解了我的问题,并且我不太确定我是否应该继续按照原来的计划进行。 尽管如此,我觉得彼得的回答是问题本身的最佳答案,这就是我接受它的原因。
I'm toying with the idea of writing a physics simulation software in which each physical element would be simulated in its own thread.
There would be several advantages to this approach. It would be conceptually very close to how the real world works. It would be much easier to scale the system to multiple machines.
However, for this to work I need to make sure that all threads run at the same speed, with a rather liberal interpretation of 'same'. Say within 1% of each others.
That's why I don't necessarily need a Thread.join() like solution. I don't want some uber-controlling school mistress that ensures all threads regularly synchronize with each others. I just need to be able to ask the runtime (whichever it is---could be Java, Erlang, or whatever is most appropriate for this problem) to run the threads at a more or less equal speed.
Any suggestions would be extremely appreciated.
UPDATE 2009-03-16
I wanted to thank everyone who answered this question, in particular all those whose answer was essentially "DON'T DO THIS". I understand my problem much better now thanks to everybody's comments and I am less sure I should continue as I originally planned. Nevertheless I felt that Peter's answer was the best answer to the question itself, which is why I accepted it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(14)
如果没有协调,你就无法真正做到这一点。 如果一个元素最终需要比另一个元素更便宜的计算(以一种可能不明显的方式)怎么办?
您不一定需要超级控制器 - 您可以只为每个线程保留某种步数计数器,并有一个指示“最慢”线程的全局计数器。 (当每个线程完成一些工作时,它必须检查它是否落后于其他线程,如果是,则更新计数器。)如果一个线程注意到它远远领先于最慢的线程,它可以短暂等待(可能在监视器上)。
只需经常这样做,以避免由于共享数据争用而产生过多的开销,我认为它可以很好地工作。
You can't really do this without coordination. What if one element ended up needing cheaper calculations than another (in a potentially non-obvious way)?
You don't necessarily need an uber-controller - you could just keep some sort of step counter per thread, and have a global counter indicating the "slowest" thread. (When each thread has done some work, it would have to check whether it had fallen behind the others, and update the counter if so.) If a thread notices it's a long way ahead of the slowest thread, it could just wait briefly (potentially on a monitor).
Just do this every so often to avoid having too much overhead due to shared data contention and I think it could work reasonably well.
您将需要某种同步。 CyclicBarrier 类有什么你需要:
在每次“滴答”之后,您可以让所有线程等待其他线程,而其他线程速度较慢。 当剩余线程到达屏障时,它们都会继续。
You'll need some kind of synchronization. CyclicBarrier class has what you need:
After each 'tick', you can let all your threads to wait for others, which were slower. When remaining threads reach the barrier, they all will continue.
线程应该完全独立地运行,这意味着以任何方式同步它们总是很痛苦。 在你的情况下,你需要一个中央“时钟”,因为没有办法告诉虚拟机每个线程应该获得相同数量的......呃......它应该得到什么? 相同容量的 RAM? 可能没关系。 相同数量的CPU? 您的所有对象是否都非常相似,以至于每个对象都需要相同数量的汇编指令?
所以我的建议是使用一个中央时钟,向每个进程广播时钟滴答声。 每个进程中的所有线程都会读取刻度(应该是绝对的),计算与它们看到的最后一个刻度的差异,然后相应地更新其内部模型。
当一个线程完成更新后,它必须让自己进入睡眠状态; 等待下一个滴答声。 在 Java 中,对“tick receive”锁使用 wait() 并使用“notifyAll()”唤醒所有线程。
Threads are meant to run completely independent of each other, which means synchronizing them in any way is always a pain. In your case, you need a central "clock" because there is no way to tell the VM that each thread should get the same amount of ... uh ... what should it get? The same amount of RAM? Probably doesn't matter. The same amount of CPU? Are all your objects so similar that each needs the same number of assembler instructions?
So my suggestion is to use a central clock which broadcasts clock ticks to every process. All threads within each process read the ticks (which should be absolute), calculate the difference to the last tick they saw and then update their internal model accordingly.
When a thread is done updating, it must put itself to sleep; waiting for the next tick. In Java, use wait() on the "tick received" lock and wake all threads with "notifyAll()".
我建议尽可能不要使用线程,因为如果您不小心,它们只会在以后添加问题。 在进行物理模拟时,您可以使用数十万个离散对象进行更大的模拟。 据我所知,你不可能在任何操作系统上创建这么多线程,即使你可以,它的性能也会很糟糕!
在您的情况下,您可以创建多个线程,并在每个线程中放置一个事件循环。 “主”线程可以对执行进行排序,并将“进程”事件发布到每个工作线程以唤醒它并使其执行一些工作。 这样,线程就会休眠,直到您告诉它们工作为止。
您应该能够让主线程以允许所有工作线程在下一个时钟周期之前完成的速率运行。
我不认为线程是您问题的答案,除了并行化为少量工作线程(等于机器中的核心数量),每个工作线程对一系列物理对象进行线性排序。 您仍然可以通过这种方式使用主/事件驱动方法,但会消除大量开销。
I'd recommend not using threads wherever possible because they just add problems later if you're not careful. When doing physics simulations you could use hundreds of thousands of discrete objects for larger simulations. You can't possibly create this many threads on any OS that I know of, and even if you could it would perform like shit!
In your case you could create a number of threads, and put an event loop in each thread. A 'master' thread could sequence the execution and post a 'process' event to each worker thread to wake it up and make it do some work. In that way the threads will sleep until you tell them to work.
You should be able to get the master thread to tick at a rate that allows all your worker threads to complete before the next tick.
I don't think threads are the answer to your problem, with the exception of parallelising into a small number of worker threads (equal to the number of cores in the machine) which each linearly sequence a series of physical objects. You could still use the master/event-driven approach this way, but you would remove a lot of the overhead.
请不要。 线程是一种操作系统抽象,允许出现并行执行。 对于多个和多核CPU,操作系统可以(但不需要)在不同的核心之间分配线程。
我认为最接近您的可扩展性愿景的可行方法是使用工作线程,其大小大致匹配您拥有的核心数量,并在它们之间分配工作。 草稿:定义一个 ActionTick 类,它对一个粒子进行更新,并让工作线程从共享队列中选择要处理的 ActionTick。 即使采用这样的解决方案,我也看到了一些挑战。
警告:我没有使用过任何大型模拟软件,只是一些业余爱好者代码。
Please don't. Threads are an O/S abstraction permitting the appearance of parallel execution. With multiple and multicore CPU's, the O/S can (but need not) distribute threads among the different cores.
The closest thing to your scalability vision which I see as workable is to use worker threads, dimensioned to roughly match the number of cores you have, and distribute work among them. A rough draft: define a class ActionTick which does the updating for one particle, and let the worker thread pick ActionTicks to process from a shared queue. I see several challenges even with such a solution.
Caveat: I haven't worked with any massive simulation software, just some hobbyist code.
正如您提到的,有很多“不要这样做”的答案。 大多数人似乎将线程视为 Java 使用的操作系统线程。 既然你在帖子中提到了 Erlang,我想发布一个更以 Erlang 为中心的答案。
使用进程(或参与者、微线程、绿色线程,因为它们有时被称为)来建模这种模拟不一定需要任何同步。 本质上,我们有几个(很可能是数千或数十万)物理对象需要模拟。 我们希望尽可能真实地模拟这些对象,但可能还涉及某种实时方面(但不一定如此,您在问题中没有提到这一点)。
一个简单的解决方案是为每个对象生成一个 Erlang 进程,向所有对象发送刻度并收集模拟结果,然后再继续下一个刻度。 这实际上是同步一切。 当然,它更多的是一种确定性解决方案,并且不保证任何实时属性。 进程如何相互通信以获取计算所需的数据也很重要。 你可能需要以巧妙的方式对它们进行分组(碰撞组等),对休眠对象使用休眠进程(Erlang对此有很好的支持)等以加快速度。
为了获得实时属性,您可能需要限制进程执行的计算(以准确性换取速度)。 这也许可以通过发送蜱而不等待答案来完成,并让对象进程用它们的当前位置和您需要的其他数据回复每个蜱(即使它可能只近似于时间)。 正如 DJClayworth 所说,这可能会导致模拟中累积错误。
我想从某种意义上说,问题实际上在于是否可以利用并发的优势来获得某种优势。 如果您需要同步,则这是一个非常强烈的信号,表明您不需要每个物理对象之间的并发性。 因为等待其他进程实际上浪费了大量的计算时间。 您可能在计算过程中使用并发,但我认为这是另一个讨论。
注意:这些想法都没有考虑实际的物理计算。 这不是 Erlang 的强项,也许可以在 C 库或任何你喜欢的库中执行,具体取决于你想要的特性类型。
注意:我不知道有任何这样做过的情况(尤其是我没有这样做过),所以我不能保证这是合理的建议。
As you mention, there are many "DON'T DO THIS" answers. Most seem to read threads as OS threads used by Java. Since you mentioned Erlang in your post, I'd like to post a more Erlang-centered answer.
Modeling this kind of simulation with processes (or actors, micro threads, green threads, as they are sometimes called) doesn't necessarily need any synchronization. In essence, we have a couple of (most likely thousands or hundreds of thousands) physics objects that need to be simulated. We want to simulate these objects as realistically as possible, but there is probably also some kind of real time aspect involved (doesn't have to be though, you don't mention this in your question).
A simple solution would be to spawn of an Erlang process for each object, sent ticks to all of them and collect the results of the simulation before proceeding with the next tick. This is in practice synchronizing everything. It is of course more of a deterministic solution and does not guarantee any real time properties. It is also non-trivial how the processes would talk to each other to get the data they need for the calculations. You probably need to group them in clever ways (collision groups etc), have hibernated processes (which Erlang has neat support for) for sleeping objects, etc to speed things up.
To get real time properties you probably need to restrain the calculations performed by the processes (trading accuracy for speed). This could perhaps be done by sending out ticks without waiting for answers, and letting the object processes reply back to each tick with their current position and other data you need (even though it might only be approximated at the time). As DJClayworth says, this could lead to errors accumulating in the simulation.
I guess in one sense, the question is really about if it is possible to use the strength of concurrency to gain some kind of advantage here. If you need synchronization, it is a quite strong sign that you do not need concurrency between each physics object. Because you essentially throw away a lot of computation time by waiting for other processes. You might use concurrency during calculation but that is another discussion, I think.
Note: none of these ideas take the actual physics calculations into account. This is not Erlang strong side and could perhaps be performed in a C library or whatever strikes your fancy, depending on the type of characteristics you want.
Note: I do not know of any case where this has been done (especially not by me), so I cannot guarantee that this is sound advice.
即使有完美的软件,硬件也会阻止你这样做。 硬件线程通常没有公平的性能。 在短时间内,如果线程运行在 +-10% 的性能范围内,那么您很幸运。
当然,这些都是异常值。 某些芯片组会在省电模式下运行某些内核,而另一些则不会。 我相信其中一台蓝色基因研究机器具有软件控制的硬件线程调度而不是锁。
Even with perfect software, hardware will prevent you doing this. Hardware threads typically don't have fair performance. Over a short period, you are lucky if threads run within +-10% performance.
The are, of course, outliers. Some chipsets will run some cores in powersaving mode and others not. I believe one of the Blue Gene research machines had software controlled scheduling of hardware threads instead of locks.
默认情况下,Erlang 将尝试将其进程均匀地分布在可用线程上。 默认情况下,它还会尝试在所有可用处理器上运行线程。 因此,如果您有足够的可运行 Erlang 进程,那么您将获得相对均匀的平衡。
Erlang will by default try and spread its processes evenly over the available threads. It will also by default try to run threads on all available processors. So if you have enough runnable Erlang processes then you will get a relatively even balance.
我不是线程专家,但是线程的全部意义不就是它们彼此独立并且是不确定的吗?
I'm not a threading expert, but isn't the whole point of threads that they are independent from each other - and non-deterministic?
我认为您在问题中存在根本性的误解:
现实世界根本不以类似线程的方式工作。 大多数机器中的线程不是独立的,实际上甚至不是同时的(
操作系统
将使用上下文切换)。 当发生大量 IO 或等待时,它们提供最大价值。最重要的是,现实世界不会随着更复杂的事情发生而“消耗更多的资源”。 想象一下两个物体从高处落下的区别,一个物体平稳落下,另一个物体执行某种复杂的翻滚运动……
I think you have a fundamental misconception in your question where you say:
The real world does not work in a thread-like way at all. Threads in most machines are not independent and not actually even simultaneous (the
OS
will use context-switching instead). They provide the most value when there is a lot ofIO
or waiting occurring.Most importantly, the real-world does not "consume more resources" as more complex things happen. Think of the difference between two objects falling from a height, one falling smoothly and the other performing some kind of complex tumbling motion...
我会制作一种“时钟生成器” - 并在那里注册每个新对象/线程。 当 delta-t 过去时,时钟将通知所有注册的对象。
然而,这并不意味着每个对象都需要一个单独的线程。 理想情况下,您将拥有与处理器一样多的线程。
从设计角度来看,您可以通过执行器或线程池分离对象任务的执行,例如,当对象接收到滴答事件时,它会进入线程池并安排自己执行。
I would make a kind of "clock generator" - and would register every new object/thread there. The clock will notify all registered objects when the delta-t has passed.
However this does not mean you need a separate thread for every object. Ideally you will have as many threads as processors.
From a design point of you could separate the execution of the object-tasks through an Executor or a thread-pool, e.g. when an object receives the tick event, it goes to a thread pool and schedules itself for execution.
为了实现这一目标,必须发生两件事。 您必须确保每个 CPU 核心具有相同数量的线程,并且需要某种同步。
这种同步可以相当简单,比如在执行计算时检查每个线程的“cycle-done”变量,但你无法避免它。
Two things has to happen in order to achieve this. You have to assure thah you have equal number of threads per CPU core, and you need some kind of synchronization.
That sync can be rather simple, like checking "cycle-done" variable for each thread while performing computation, but you can't avoid it.
在电机控制方面,我使用了一些数学方法来将速度保持在稳定状态。
系统具有PID控制、比例、积分、微分。 但这是模拟/数字系统。 也许可以使用类似的方法来确定每个线程必须运行多长时间,但我可以给您的最大提示是所有线程都将具有时钟同步。
Working at control for motors i have used some math to maintain velocity at stable state.
The system have PID control, proportional, integral and derivative. But this is analog/digital system. Maybe can use similarly to determine how mush time each thread must run, but the biggest tip I can give you is that all threads will each have a clock synchronization.
我首先承认我不是线程专家,但这听起来是一种非常错误的模拟方法。 正如其他人已经评论的那样,拥有太多线程的计算成本很高。 此外,如果你打算做我认为你想做的事情,你的模拟可能会产生随机结果(如果你正在制作游戏,可能并不重要)。
我会使用一些工作线程来计算模拟的离散步骤。
I'm first to admit I'm not a threading expert, but this sounds like a very wrong way to approach simulation. As others have already commented having too many threads is computationally expensive. Furthermore, if you are planing to do what I think you are thinking of doing, your simulation may turn out to produce random results (may not matter if you are making a game).
I'd go with a few worker threads used to calculate discrete steps of the simulation.