函数式语言（特别是 Erlang）如何/为什么能够很好地扩展？

发布于 2024-07-12 13:21:36 字数 633 浏览 5 评论 0原文

一段时间以来，我一直在关注函数式编程语言和功能的日益增长的知名度。我调查了他们，没有看到上诉的理由。

然后，最近我参加了 Kevin Smith 在 Codemash 上的“Basics of Erlang”演讲。

我很喜欢这个演示，并了解到函数式编程的许多属性可以更轻松地避免线程/并发问题。我知道缺乏状态和可变性使得多个线程不可能更改相同的数据，但 Kevin 说（如果我理解正确的话）所有通信都是通过消息进行的，并且消息是同步处理的（再次避免了并发问题）。

但我读到，Erlang 用于高度可扩展的应用程序（这就是爱立信最初创建它的全部原因）。如果所有内容都作为同步处理的消息进行处理，那么如何才能有效地每秒处理数千个请求呢？这不是我们开始转向异步处理的原因 - 这样我们就可以利用同时运行多个操作线程并实现可扩展性？看起来这种架构虽然更安全，但在可扩展性方面却倒退了一步。我缺少什么？

我理解 Erlang 的创建者故意避免支持线程以避免并发问题，但我认为多线程对于实现可扩展性是必要的。

函数式编程语言如何才能本质上是线程安全的，同时仍可扩展？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

网名女生简单气质 2024-07-19 13:21:36

函数式语言（通常）不依赖改变变量。因此，我们不必保护变量的“共享状态”，因为该值是固定的。这反过来又避免了传统语言在跨处理器或机器实现算法时必须经历的大部分跳跃。

Erlang 比传统的函数式语言更进一步，它内置了一个消息传递系统，允许一切在基于事件的系统上运行，其中一段代码只关心接收消息和发送消息，而不用担心更大的情况。

这意味着程序员（名义上）不关心消息将在另一个处理器或机器上处理：只需发送消息就足以让它继续。如果它关心响应，它将作为另一条消息等待。

最终结果是每个片段都独立于其他片段。没有共享代码，没有共享状态，所有交互都来自可以分布在（或不分布）多个硬件之间的消息系统。

与传统系统对比：我们必须在“受保护”变量和代码执行周围放置互斥体和信号量。我们通过堆栈在函数调用中进行紧密绑定（等待返回发生）。所有这些都会造成瓶颈，但在像 Erlang 这样的无共享系统中，这些瓶颈并不是什么问题。

编辑：我还应该指出 Erlang 是异步的。您发送消息，也许/有一天会收到另一条消息。或不。

斯宾塞关于无序执行的观点也很重要并且得到了很好的回答。

回复收藏 0 原文

夏有森光若流苏 2024-07-19 13:21:36

消息队列系统很酷，因为它有效地产生了“触发并等待结果”效果，这就是您正在阅读的同步部分。令人难以置信的是，这意味着行不需要按顺序执行。考虑以下代码：

r = methodWithALotOfDiskProcessing();
x = r + 1;
y = methodWithALotOfNetworkProcessing();
w = x * y

考虑一下 methodWithALotOfDiskProcessing() 大约需要 2 秒才能完成，而 methodWithALotOfNetworkProcessing() 大约需要 1 秒才能完成。在过程语言中，此代码运行大约需要 3 秒，因为各行将按顺序执行。我们浪费时间等待一种方法完成，该方法可以与另一种方法同时运行，而无需竞争单个资源。在函数式语言中，代码行并不决定处理器何时尝试执行它们。函数式语言会尝试如下内容：

Execute line 1 ... wait.
Execute line 2 ... wait for r value.
Execute line 3 ... wait.
Execute line 4 ... wait for x and y value.
Line 3 returned ... y value set, message line 4.
Line 1 returned ... r value set, message line 2.
Line 2 returned ... x value set, message line 4.
Line 4 returned ... done.

这有多酷？通过继续执行代码并仅在必要时等待，我们已将等待时间自动减少到两秒！ :D 所以是的，虽然代码是同步的，但它往往具有与过程语言不同的含义。

编辑：

一旦您结合 Godeke 的帖子掌握了这个概念，就很容易想象利用多个处理器、服务器场、冗余数据存储以及谁知道还有什么东西会变得多么简单。

The message queue system is cool because it effectively produces a "fire-and-wait-for-result" effect which is the synchronous part you're reading about. What makes this incredibly awesome is that it means lines do not need to be executed sequentially. Consider the following code:

r = methodWithALotOfDiskProcessing();
x = r + 1;
y = methodWithALotOfNetworkProcessing();
w = x * y

Consider for a moment that methodWithALotOfDiskProcessing() takes about 2 seconds to complete and that methodWithALotOfNetworkProcessing() takes about 1 second to complete. In a procedural language this code would take about 3 seconds to run because the lines would be executed sequentially. We're wasting time waiting for one method to complete that could run concurrently with the other without competing for a single resource. In a functional language lines of code don't dictate when the processor will attempt them. A functional language would try something like the following:

Execute line 1 ... wait.
Execute line 2 ... wait for r value.
Execute line 3 ... wait.
Execute line 4 ... wait for x and y value.
Line 3 returned ... y value set, message line 4.
Line 1 returned ... r value set, message line 2.
Line 2 returned ... x value set, message line 4.
Line 4 returned ... done.

How cool is that? By going ahead with the code and only waiting where necessary we've reduced the waiting time to two seconds automagically! :D So yes, while the code is synchronous it tends to have a different meaning than in procedural languages.

EDIT:

Once you grasp this concept in conjunction with Godeke's post it's easy to imagine how simple it becomes to take advantage of multiple processors, server farms, redundant data stores and who knows what else.

回复收藏 0 原文

云裳 2024-07-19 13:21:36

您可能将同步与顺序混淆了。

erlang 中的函数体是按顺序处理的。
所以 Spencer 所说的这种“自动神奇效应”对于 erlang 来说并不成立。不过你可以用 erlang 来模拟这种行为。

例如，您可以生成一个计算一行中单词数的进程。
由于我们有多行，因此我们为每一行生成一个这样的进程，并接收答案以从中计算总和。

这样，我们生成执行“繁重”计算的进程（如果可用，则利用额外的核心），然后收集结果。

-module(countwords).
-export([count_words_in_lines/1]).

count_words_in_lines(Lines) ->
    % For each line in lines run spawn_summarizer with the process id (pid)
    % and a line to work on as arguments.
    % This is a list comprehension and spawn_summarizer will return the pid
    % of the process that was created. So the variable Pids will hold a list
    % of process ids.
    Pids = [spawn_summarizer(self(), Line) || Line <- Lines], 
    % For each pid receive the answer. This will happen in the same order in
    % which the processes were created, because we saved [pid1, pid2, ...] in
    % the variable Pids and now we consume this list.
    Results = [receive_result(Pid) || Pid <- Pids],
    % Sum up the results.
    WordCount = lists:sum(Results),
    io:format("We've got ~p words, Sir!~n", [WordCount]).

spawn_summarizer(S, Line) ->
    % Create a anonymous function and save it in the variable F.
    F = fun() ->
        % Split line into words.
        ListOfWords = string:tokens(Line, " "),
        Length = length(ListOfWords),
        io:format("process ~p calculated ~p words~n", [self(), Length]),
        % Send a tuple containing our pid and Length to S.
        S ! {self(), Length}
    end,
    % There is no return in erlang, instead the last value in a function is
    % returned implicitly.
    % Spawn the anonymous function and return the pid of the new process.
    spawn(F).

% The Variable Pid gets bound in the function head.
% In erlang, you can only assign to a variable once.
receive_result(Pid) ->
    receive
        % Pattern-matching: the block behind "->" will execute only if we receive
        % a tuple that matches the one below. The variable Pid is already bound,
        % so we are waiting here for the answer of a specific process.
        % N is unbound so we accept any value.
        {Pid, N} ->
            io:format("Received \"~p\" from process ~p~n", [N, Pid]),
            N
    end.

当我们在 shell 中运行它时，它是这样的：

Eshell V5.6.5  (abort with ^G)
1> Lines = ["This is a string of text", "and this is another", "and yet another", "it's getting boring now"].
["This is a string of text","and this is another",
 "and yet another","it's getting boring now"]
2> c(countwords).
{ok,countwords}
3> countwords:count_words_in_lines(Lines).
process <0.39.0> calculated 6 words
process <0.40.0> calculated 4 words
process <0.41.0> calculated 3 words
process <0.42.0> calculated 4 words
Received "6" from process <0.39.0>
Received "4" from process <0.40.0>
Received "3" from process <0.41.0>
Received "4" from process <0.42.0>
We've got 17 words, Sir!
ok
4>

It's likely that you're mixing up synchronous with sequential.

The body of a function in erlang is being processed sequentially.
So what Spencer said about this "automagical effect" doesn't hold true for erlang. You could model this behaviour with erlang though.

For example you could spawn a process that calculates the number of words in a line.
As we're having several lines, we spawn one such process for each line and receive the answers to calculate a sum from it.

That way, we spawn processes that do the "heavy" computations (utilizing additional cores if available) and later we collect the results.

-module(countwords).
-export([count_words_in_lines/1]).

count_words_in_lines(Lines) ->
    % For each line in lines run spawn_summarizer with the process id (pid)
    % and a line to work on as arguments.
    % This is a list comprehension and spawn_summarizer will return the pid
    % of the process that was created. So the variable Pids will hold a list
    % of process ids.
    Pids = [spawn_summarizer(self(), Line) || Line <- Lines], 
    % For each pid receive the answer. This will happen in the same order in
    % which the processes were created, because we saved [pid1, pid2, ...] in
    % the variable Pids and now we consume this list.
    Results = [receive_result(Pid) || Pid <- Pids],
    % Sum up the results.
    WordCount = lists:sum(Results),
    io:format("We've got ~p words, Sir!~n", [WordCount]).

spawn_summarizer(S, Line) ->
    % Create a anonymous function and save it in the variable F.
    F = fun() ->
        % Split line into words.
        ListOfWords = string:tokens(Line, " "),
        Length = length(ListOfWords),
        io:format("process ~p calculated ~p words~n", [self(), Length]),
        % Send a tuple containing our pid and Length to S.
        S ! {self(), Length}
    end,
    % There is no return in erlang, instead the last value in a function is
    % returned implicitly.
    % Spawn the anonymous function and return the pid of the new process.
    spawn(F).

% The Variable Pid gets bound in the function head.
% In erlang, you can only assign to a variable once.
receive_result(Pid) ->
    receive
        % Pattern-matching: the block behind "->" will execute only if we receive
        % a tuple that matches the one below. The variable Pid is already bound,
        % so we are waiting here for the answer of a specific process.
        % N is unbound so we accept any value.
        {Pid, N} ->
            io:format("Received \"~p\" from process ~p~n", [N, Pid]),
            N
    end.

And this is what it looks like, when we run this in the shell:

Eshell V5.6.5  (abort with ^G)
1> Lines = ["This is a string of text", "and this is another", "and yet another", "it's getting boring now"].
["This is a string of text","and this is another",
 "and yet another","it's getting boring now"]
2> c(countwords).
{ok,countwords}
3> countwords:count_words_in_lines(Lines).
process <0.39.0> calculated 6 words
process <0.40.0> calculated 4 words
process <0.41.0> calculated 3 words
process <0.42.0> calculated 4 words
Received "6" from process <0.39.0>
Received "4" from process <0.40.0>
Received "3" from process <0.41.0>
Received "4" from process <0.42.0>
We've got 17 words, Sir!
ok
4>

回复收藏 0 原文

巷雨优美回忆 2024-07-19 13:21:36

Erlang 能够扩展的关键在于并发性。

操作系统通过两种机制提供并发性：

操作系统进程
操作系统线程

进程不共享状态 – 根据设计，一个进程不能使另一个进程崩溃。

线程共享状态——一个线程可能会因设计而导致另一个线程崩溃——这就是你的问题。

对于 Erlang，虚拟机使用一个操作系统进程，VM 不是通过使用操作系统线程而是通过提供 Erlang 进程来为 Erlang 程序提供并发性，也就是说，Erlang 实现了自己的时间片。

这些 Erlang 进程通过发送消息（由 Erlang VM 而不是操作系统处理）来相互通信。 Erlang 进程使用进程 ID (PID) 相互寻址，该进程 ID 具有三部分地址 <>：

上的进程号 N1
上的 VM N2
物理机 N3

同一虚拟机上、同一台机器上的不同虚拟机上或两台机器上的两个进程以相同的方式进行通信 - 因此，您的扩展与部署应用程序的物理机数量无关（在第一个近似值中）。

Erlang 只是在微不足道的意义上是线程安全的——它没有线程。（即 SMP/多核 VM 的语言每个核心使用一个操作系统线程）。

回复收藏 0 原文

左岸枫 2024-07-19 13:21:36

你可能对 Erlang 的工作原理有误解。 Erlang 运行时最大限度地减少了 CPU 上的上下文切换，但如果有多个可用的 CPU，则所有 CPU 都会用于处理消息。您没有其他语言中那样的“线程”，但您可以同时处理大量消息。

回复收藏 0 原文

南…巷孤猫 2024-07-19 13:21:36

Erlang 消息纯粹是异步的，如果您想要同步回复消息，您需要为此显式编码。可能的意思是，进程消息框中的消息是按顺序处理的。发送到进程的任何消息都会位于该进程消息框中，并且该进程可以从该消息框中选择一条消息进行处理，然后按照它认为合适的顺序继续处理下一条消息。这是一个非常连续的行为，接收块正是这样做的。

正如克里斯提到的，看起来您混淆了同步和顺序。

回复收藏 0 原文

素手挽清风 2024-07-19 13:21:36

引用透明度：请参阅http://en.wikipedia.org/wiki/Referential_transparency_(computer_science)< /a>

回复收藏 0 原文

夜巴黎 2024-07-19 13:21:36

在纯函数式语言中，求值顺序并不重要 - 在函数应用程序 fn(arg1, .. argn) 中，可以并行求值 n 个参数。这保证了高水平的（自动）并行性。

Erlang 使用进程模型，进程可以在同一个虚拟机中运行，也可以在不同的处理器上运行——没有办法区分。这是可能的，因为消息是在进程之间复制的，没有共享（可变）状态。多处理器并行比多线程走得更远，因为线程依赖于共享内存，因此在 8 核 CPU 上只能并行运行 8 个线程，而多处理可以扩展到数千个并行进程。

回复收藏 0 原文

~没有更多了~