Erlang 的并行性何时克服了其在数值计算方面的弱点?

发布于 2024-08-02 06:34:55 字数 504 浏览 5 评论 0原文

最近,随着有关并行计算的大肆宣传,我一直在思考并行性、数字运算、集群等方面的问题......

我开始阅读 向你学习一些 Erlang。随着越来越多的人(包括我自己)学习,Erlang 以一种令人印象深刻、优雅的方式处理并发。

然后作者断言 Erlang 不适合数字运算。我可以理解像 Erlang 这样的语言会比 C 慢,但是并发模型似乎非常适合图像处理或矩阵乘法之类的事情,尽管作者明确表示不适合。

真的有那么糟糕吗? Erlang 的优势是否存在克服其局部速度劣势的临界点?正在采取什么措施来解决速度问题?

需要明确的是:我并不是想引发一场辩论;我只是想发起一场辩论。我只是想知道。

With all the hype around parallel computing lately, I've been thinking a lot about parallelism, number crunching, clusters, etc...

I started reading Learn You Some Erlang. As more people are learning (myself included), Erlang handles concurrency in a very impressive, elegant way.

Then the author asserts that Erlang is not ideal for number crunching. I can understand that a language like Erlang would be slower than C, but the model for concurrency seems ideally suited to things like image handling or matrix multiplication, even though the author specifically says its not.

Is it really that bad? Is there a tipping point where Erlang's strength overcomes its local speed weakness? Are/what measures are being taken to deal with speed?

To be clear: I'm not trying to start a debate; I just want to know.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

听风念你 2024-08-09 06:34:55

将并行性仅仅视为原始数字处理能力是错误​​的。 Erlang 比 GPU 或经典超级计算机更接近集群计算机的工作方式。

在现代 GPU 和老式超级计算机中,性能主要取决于矢量化算术、专用计算硬件以及处理单元之间的低延迟通信。由于通信延迟很低并且每个单独的计算单元都非常快,因此理想的使用模式是将数据加载到机器的 RAM 中并让它立即处理所有数据。此处理可能涉及节点之间传递的大量数据,如图像处理或 3D 中所发生的情况,其中需要执行大量 CPU 密集型任务来将数据从输入形式转换为输出形式。当您经常需要访问磁盘、网络或其他一些慢速 I/O 通道来获取数据时,这种类型的机器并不是一个好的选择。这会闲置至少一个昂贵的专用处理器,并且可能还会阻塞数据处理管道,因此其他任何事情也无法完成。

如果您的程序需要大量使用慢速 I/O 通道,则更好的机器类型是具有许多廉价独立处理器的机器,例如集群。你可以在一台机器上运行 Erlang,在这种情况下,你会在该机器中得到类似于集群的东西,或者你可以轻松地在实际的硬件集群上运行它,在这种情况下,你有一个集群的集群。在这里,通信开销仍然使处理单元闲置,但由于每个计算硬件上都有许多处理单元运行,Erlang 可以立即切换到其他进程之一。如果碰巧有一整台机器坐在那里等待 I/O,硬件集群中的其他节点仍然可以独立运行。仅当通信开销如此之高以至于每个节点都在等待其他节点或一般 I/O 时,此模型才会崩溃,在这种情况下,您要么需要更快的 I/O 要么需要更多节点,而 Erlang 自然地利用了这两者的。

通信和控制系统是 Erlang 的理想应用,因为每个单独的处理任务只需要很少的 CPU,并且只是偶尔需要与其他处理节点进行通信。大多数时候,每个进程都是独立运行的,每个进程只占用一小部分 CPU 资源。这里最重要的是能够有效地处理数千个此类问题。

您绝对需要经典超级计算机的经典案例是天气预报。在这里,您将大气分成多个立方体,并进行物理模拟以了解每个立方体中发生的情况,但您不能使用集群,因为空气在每个立方体之间移动,因此每个立方体不断与其 6 个相邻的邻居进行通信。 (空气不会穿过立方体的边缘或角落,是无限细的,因此它不会与其他 20 个相邻的立方体通信。)在集群上运行它,无论是在其上运行 Erlang 还是其他系统,并且它立即变成 I/O 绑定。

It's a mistake to think of parallelism as only about raw number crunching power. Erlang is closer to the way a cluster computer works than, say, a GPU or classic supercomputer.

In modern GPUs and old-style supercomputers, performance is all about vectorized arithmetic, special-purpose calculation hardware, and low-latency communication between processing units. Because communication latency is low and each individual computing unit is very fast, the ideal usage pattern is to load the machine's RAM up with data and have it crunch it all at once. This processing might involve lots of data passing among the nodes, as happens in image processing or 3D, where there are lots of CPU-bound tasks to do to transform the data from input form to output form. This type of machine is a poor choice when you frequently have to go to a disk, network, or some other slow I/O channel for data. This idles at least one expensive, specialized processor, and probably also chokes the data processing pipeline so nothing else gets done, either.

If your program requires heavy use of slow I/O channels, a better type of machine is one with many cheap independent processors, like a cluster. You can run Erlang on a single machine, in which case you get something like a cluster within that machine, or you can easily run it on an actual hardware cluster, in which case you have a cluster of clusters. Here, communication overhead still idles processing units, but because you have many processing units running on each bit of computing hardware, Erlang can switch to one of the other processes instantaneously. If it happens that an entire machine is sitting there waiting on I/O, you still have the other nodes in the hardware cluster that can operate independently. This model only breaks down when the communication overhead is so high that every node is waiting on some other node, or for general I/O, in which case you either need faster I/O or more nodes, both of which Erlang naturally takes advantage of.

Communication and control systems are ideal applications of Erlang because each individual processing task takes little CPU and only occasionally needs to communicate with other processing nodes. Most of the time, each process is operating independently, each taking a tiny fraction of the CPU power. The most important thing here is the ability to handle many thousands of these efficiently.

The classic case where you absolutely need a classic supercomputer is weather prediction. Here, you divide the atmosphere up into cubes and do physics simulations to find out what happens in each cube, but you can't use a cluster because air moves between each cube, so each cube is constantly communicating with its 6 adjacent neighbors. (Air doesn't go through the edges or corners of a cube, being infinitely fine, so it doesn't talk to the other 20 neighboring cubes.) Run this on a cluster, whether running Erlang on it or some other system, and it instantly becomes I/O bound.

窗影残 2024-08-09 06:34:55

Erlang 的优势是否存在克服其局部速度劣势的临界点?

嗯,当然有。例如,当尝试查找一万亿个数字的中位数时:):

http://matpalm.com/median /question.html

就在您发帖之前,我碰巧注意到这是 erlang.reddit.com 上的排名第一的帖子。

Is there a tipping point where Erlang's strength overcomes its local speed weakness?

Well, of course there is. For example, when trying to find the median of a trillion numbers :) :

http://matpalm.com/median/question.html

Just before you posted, I happened to notice this was the number 1 post on erlang.reddit.com.

心的位置 2024-08-09 06:34:55

几乎任何语言都可以并行化。在某些语言中这很简单,在另一些语言中则很麻烦,但这是可以做到的。如果您想在网格中的 8000 个 CPU 上运行 C++ 程序,那就来吧!你可以这么做。以前已经做过了。

Erlang 不会做任何其他语言不可能做的事情。如果单个 CPU 运行 Erlang 程序的效率低于同一 CPU 运行 C++ 程序的效率,那么 200 个 CPU 运行 Erlang 也会比 200 个 CPU 运行 C++ 慢。

Erlang 所做的就是使这种并行性变得易于使用。它节省了开发人员的时间并减少了出现错误的机会。

所以我要说不,Erlang 的并行性不存在超越其他语言的数字处理能力的临界点。

Erlang 的得分点在于使其更容易进行扩展并正确地进行扩展。但如果您愿意花费额外的开发时间,仍然可以使用更擅长数字运算的其他语言来完成。

当然,我们不要忘记一个古老的观点:语言没有速度
一个足够好的 Erlang 编译器将产生完美的最佳代码。一个足够糟糕的 C 编译器会生成运行速度比其他任何东西都慢的代码。

Almost any language can be parallelized. In some languages it's simple, in others it's a pain in the butt, but it can be done. If you want to run a C++ program across 8000 CPU's in a grid, go ahead! You can do that. It's been done before.

Erlang doesn't do anything that's impossible in other languages. If a single CPU running an Erlang program is less efficient than the same CPU running a C++ program, then two hundred CPU's running Erlang will also be slower than two hundred CPU's running C++.

What Erlang does do is making this kind of parallelism easy to work with. It saves developer time and reduces the chance of bugs.

So I'm going to say no, there is no tipping point at which Erlang's parallelism allows it to outperform another language's numerical number-crunching strength.

Where Erlang scores is in making it easier to scale out and do so correctly. But it can still be done in other languages which are better at number-crunching, if you're willing to spend the extra development time.

And of course, let's not forget the good old point that languages don't have a speed.
A sufficiently good Erlang compiler would yield perfectly optimal code. A sufficiently bad C compiler would yield code that runs slower than anything else.

游魂 2024-08-09 06:34:55

存在着让 Erlang 更快地执行数字代码的压力。例如,HiPe 编译器编译为本机代码而不是 BEAM 字节码,并且它可能对浮点代码进行最有效的优化,从而避免装箱。这对于浮点代码非常有利,因为它可以直接将值存储在 FPU 寄存器中。

对于大多数 Erlang 使用来说,Erlang 的速度已经足够快了。他们使用 Erlang 编写始终运行的控制系统,其中最重要的速度测量是低延迟响应。负载下的性能往往受 IO 限制。这些用户倾向于远离 HiPe,因为它在调试实时系统时不那么灵活/可延展。

现在,具有 128Gb RAM 的服务器并不罕见,并且它们没有理由获得更多内存,因此一些 IO 密集型问题可能会转变为 CPU 密集型问题。那可能是一名司机。

您应该关注 HiPe 进行开发。


在我看来,你的图像处理和矩阵乘法的例子与 Erlang 非常不匹配。这些是受益于向量/SIMD 运算的示例。 Erlang 不擅长并行(即同时对多个值执行相同的操作)。

Erlang进程是MIMD,多指令多数据。 Erlang 在模式匹配和递归循环后面做了很多分支。这会杀死 CPU 指令流水线。

处理高度并行问题的最佳架构是 GPU。对于使用函数式语言对 GPU 进行编程,我认为使用 Haskell 创建针对 GPU 的程序具有最大潜力。 GPU 基本上是从输入数据到输出数据的纯函数。请参阅 Haskell 中的 Lava 项目来创建 FPGA 电路(如果是)在 Haskell 中可以如此干净地创建电路,为 GPU 创建程序数据再难不过了。

Cell 架构对于矢量化问题也非常有用。

There is pressure to make Erlang execute numeric code faster. The HiPe compiler compiles to native code instead of the BEAM bytecode for example, and it probably has its most effective optimization on code on floating points where it can avoid boxing. This is very beneficial for floating point code, since it can store values directly in FPU registers.

For the majority of Erlang usage, Erlang is plenty fast as it is. They use Erlang to write always-up control systems where the most important speed measurement that matters is low latency responses. Performance under load tends to be IO-bound. These users tend to stay away from HiPe since it is not as flexible/malleable in debugging live systems.

Now that servers with 128Gb of RAM are not that uncommon, and there's no reason they'll get even more memory, some IO-bound problems might shift over to be somewhat CPU bound. That could be a driver.

You should follow HiPe for the development.


Your examples of image manipulations and matrix multiplications seem to me as very bad matches for Erlang though. Those are examples that benefit from vector/SIMD operations. Erlang is not good at parallellism (where one does the same thing to multiple values at once).

Erlang processes are MIMD, multiple instructions multiple data. Erlang does lots of branching behind pattern matching and recursive loops. That kills CPU instruction pipelining.

The best architecture for heavily parallellised problems are the GPUs. For programming GPUs in a functional language I see the best potential in using Haskell for creating programs targeting them. A GPU is basically a pure function from input data to output data. See the Lava project in Haskell for creating FPGA circuits, if it is possible to create circuits so cleanly in Haskell, it can't be harder to create program data for GPUs.

The Cell architecture is very nice for vectorizable problems as well.

滥情哥ㄟ 2024-08-09 06:34:55

我认为更广泛的需要是指出并行性不一定甚至通常与速度有关。

它是关于如何表达其中活动序列是偏序的算法或程序。

I think the broader need is to point out that parallelism is not necessarily or even typically about speed.

It is about how to express algorithms or programs in which the sequence of activities is partial-ordered.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文