并行程序亚线性加速的原因
并行化程序没有达到理想加速的原因是什么?
例如,我考虑过数据依赖性、线程(或参与者)之间的数据传输成本、访问相同数据结构的同步、任何其他想法(或我提到的原因的子类别)?
我对 erlang actor 模型中发生的问题特别感兴趣,但欢迎任何其他问题。
What are the reasons a parallelized program doesn't achieve the ideal speedup?
For example, I have thought about data dependencies, the cost of data transfer between threads (or actors), synchronisation for access to the same data structures, any other ideas (or subcategories of the reasons i mentioned)?
I'm particularly interested for problems occurring in the erlang actor model but any other issues are welcomed.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
一些没有特定的顺序:
A few in no particular order:
原因之一是并行化程序通常比想象的更困难,并且可能会出现许多微妙的问题。有关此问题的精彩讨论,请参阅阿姆达尔定律。
One reason is that parallelizing a program is often more difficult than one imagines and there are many subtle problems which can occur. For a very good discussion on this see Amdahl's Law.
Erlang Actor 模型中的主要问题是每个进程都有自己的内存堆,并且传递的消息会被复制。与使用共享内存的常用方法相比,您可以在进程之间传递指向结构的指针。
在共享内存环境中,程序员需要确保一次只有一个进程/线程在一块内存上进行操作。也就是说,某个进程被指定为it,并负责在该内存区域上执行正确的操作。在 Erlang 中则不然:一个进程不能通过设计在其他进程的内存区域中进行翻查,并且您必须将值复制到其他进程。当我们考虑程序的鲁棒性时,这是非常强大的,但如果我们考虑程序执行的速度,那么就没有那么强大了。另一方面,如果我们想要一个由多台计算机组成的分布式环境,复制将占据主导地位,并且是在计算机之间传输数据的唯一方法。
阿姆达尔定律发挥作用是因为程序的某些部分可能无法分布在多个内核上。有些问题本质上是连续的:你没有希望加速它们。通常它们是迭代的,其中每个新的迭代都依赖于前一个迭代,并且您无法猜测新的迭代。
The main problem in the Erlang Actor model is that each process has its own heap of memory and messages passed are copied around. Contrast with the usual way of using shared memory where you can pass a pointer to a structure between processes.
In a shared memory environment, it is up to the programmer to ensure that only a single process/thread operates on a piece of memory at a time. That is, some process is designated as it and has responsibility for doing the right thing on that memory area. Not so much in Erlang: One process can't by design rummage in other processes memory areas and you must copy values to other processes. This is tremendously powerful when we consider robustness of programs, but not so much if we consider the speed by which the program executes. On the other hand, if we want a distributed environment of multiple computers, copying reigns king and is the only way to transfer data between machines.
Amdahl's law comes into play because parts of your program may be impossible to spread out over multiple cores. There are some problems which are inherently serial in nature: You have no hope of ever speeding them up. Usually they are iterative where each new iteration is dependent on the former and you can't make a guess at the new one.