Actor 模型:为什么 Erlang/OTP 很特别?你能用另一种语言吗?

发布于 2024-12-15 10:52:39 字数 783 浏览 0 评论 0 原文

我一直在研究学习 Erlang/OTP,因此,一直在阅读(好吧,略读)有关 Actor 模型的内容。

据我了解,参与者模型只是一组函数(在 Erlang/OTP 中称为“进程”的轻量级线程中运行),它们仅通过消息传递相互通信。

在 C++ 或任何其他语言中实现这似乎相当简单:

class BaseActor {
    std::queue<BaseMessage*> messages;
    CriticalSection messagecs;
    BaseMessage* Pop();
public:
    void Push(BaseMessage* message)
    {
        auto scopedlock = messagecs.AquireScopedLock();
        messagecs.push(message);
    }
    virtual void ActorFn() = 0;
    virtual ~BaseActor() {} = 0;
}

每个进程都是派生 BaseActor 的实例。参与者仅通过消息传递来相互通信。 (即推动)。 Actor 在初始化时向中央映射注册自己,这允许其他 Actor 找到它们,并允许中央函数运行它们。

现在,我知道我错过了,或者更确切地说,掩盖了一个重要问题,即: 缺乏收益意味着单个 Actor 可能会不公平地消耗过多时间。但跨平台协程是 C++ 中造成这一困难的主要原因吗? (例如,Windows 就有纤维。)

不过,我还遗漏了什么吗?或者这个模型真的这么明显吗?

I've been looking into learning Erlang/OTP, and as a result, have been reading (okay, skimming) about the actor model.

From what I understand, the actor model is simply a set of functions (run within lightweight threads called "processes" in Erlang/OTP), which communicate with each other only via message passing.

This seems fairly trivial to implement in C++, or any other language:

class BaseActor {
    std::queue<BaseMessage*> messages;
    CriticalSection messagecs;
    BaseMessage* Pop();
public:
    void Push(BaseMessage* message)
    {
        auto scopedlock = messagecs.AquireScopedLock();
        messagecs.push(message);
    }
    virtual void ActorFn() = 0;
    virtual ~BaseActor() {} = 0;
}

With each of your processes being an instance of a derived BaseActor. Actors communicate with each other only via message-passing. (namely, pushing). Actors register themselves with a central map on initialization which allows other actors to find them, and allows a central function to run through them.

Now, I understand I'm missing, or rather, glossing over one important issue here, namely:
lack of yielding means a single Actor can unfairly consume excessive time. But are cross-platform coroutines the primary thing that makes this hard in C++? (Windows for instance has fibers.)

Is there anything else I'm missing, though, or is the model really this obvious?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

谈下烟灰 2024-12-22 10:52:39

C++ 代码不处理公平性、隔离性、故障检测或分布,这些都是 Erlang 作为其 Actor 模型的一部分带来的。

  • 不允许任何 Actor 挨饿任何其他 Actor(公平)
  • 如果一个 Actor 崩溃,它应该只影响该 Actor(隔离)
  • 如果一个 Actor 崩溃,其他 Actor 应该能够检测到该崩溃并做出反应(故障检测)
  • Actor 应该能够通过网络进行通信,就好像它们在同一台机器上一样(分布)

Beam SMP 模拟器还提供了执行者的 JIT 调度,将它们移动到目前利用率最低的核心,并且还休眠了线程肯定如果不再需要核心。

此外,所有用 Erlang 编写的库和工具都可以假设这就是世界的运作方式并进行相应的设计。

这些事情在 C++ 中并非不可能做到,但如果再加上 Erlang 几乎适用于所有主要硬件和操作系统配置这一事实,它们就会变得越来越困难。

编辑:刚刚找到Ulf Wiger<的描述/a> 关于他对 erlang 风格并发的看法。

The C++ code does not deal with fairness, isolation, fault detection or distribution which are all things which Erlang brings as part of its actor model.

  • No actor is allowed to starve any other actor (fairness)
  • If one actor crashes, it should only affect that actor (isolation)
  • If one actor crashes, other actors should be able to detect and react to that crash (fault detection)
  • Actors should be able to communicate over a network as if they were on the same machine (distribution)

Also the beam SMP emulator brings JIT scheduling of the actors, moving them to the core which is at the moment the one with least utilization and also hibernates the threads on certain cores if they are no longer needed.

In addition all the libraries and tools written in Erlang can assume that this is the way the world works and be designed accordingly.

These things are not impossible to do in C++, but they get increasingly hard if you add the fact that Erlang works on almost all of the major hw and os configurations.

edit: Just found a description by Ulf Wiger about what he sees erlang style concurrency as.

梦晓ヶ微光ヅ倾城 2024-12-22 10:52:39

我不喜欢引用自己的话,但是来自 Virding 的第一条编程规则

任何使用另一种语言的足够复杂的并发程序都包含一个临时指定的、错误缠身的 Erlang 的缓慢实现。

关于格林斯潘。乔(阿姆斯特朗)也有类似的规则。

问题不在于执行参与者,这并不困难。问题是让所有东西一起工作:进程、通信、垃圾收集、语言原语、错误处理等……例如,使用操作系统线程的扩展性很差,所以你需要自己做。这就像试图“销售”一种 OO 语言,其中您只能拥有 1k 个对象,而且创建和使用它们的过程很繁重。从我们的角度来看,并发是构建应用程序的基本抽象。

得意忘形所以我就停在这里。

I don't like to quote myself, but from Virding's First Rule of Programming

Any sufficiently complicated concurrent program in another language contains an ad hoc informally-specified bug-ridden slow implementation of half of Erlang.

With respect to Greenspun. Joe (Armstrong) has a similar rule.

The problem is not to implement actors, that's not that difficult. The problem is to get everything working together: processes, communication, garbage collection, language primitives, error handling, etc ... For example using OS threads scales badly so you need to do it yourself. It would be like trying to "sell" an OO language where you can only have 1k objects and they are heavy to create and use. From our point of view concurrency is the basic abstraction for structuring applications.

Getting carried away so I will stop here.

自演自醉 2024-12-22 10:52:39

这实际上是一个很好的问题,并且已经得到了很好的答案,但可能尚不令人信服。

为了给这里已有的其他伟大答案添加阴影和强调,请考虑一下 Erlang 为了实现容错和正常运行时间而去除的东西(与传统的通用语言(例如 C/C++ 相比))。

首先,它取消了锁。乔·阿姆斯特朗 (Joe Armstrong) 的书阐述了这个思想实验:假设您的进程获取了锁,然后立即崩溃(内存故障导致进程崩溃,或者系统部分电源故障)。下次进程等待同一锁时,系统就陷入了死锁。这可能是一个明显的锁,如示例代码中的 AquireScopedLock() 调用所示;或者它可能是内存管理器代表您获取的隐式锁,例如在调用 malloc() 或 free() 时。

无论如何,您的进程崩溃现在已经阻止了整个系统的进展。菲尼。故事结束。你的系统死了。除非你能保证你在 C/C++ 中使用的每个库永远不会调用 malloc 并且永远不会获取锁,否则你的系统是不容错的。 Erlang 系统可以并且确实在重负载下随意终止进程以便取得进展,因此在规模上,您的 Erlang 进程必须是可终止的(在任何单个执行点)才能维持吞吐量。

有一个部分解决方法:在各处使用租约而不是锁,但您不能保证您使用的所有库也这样做。关于正确性的逻辑和推理很快就会变得非常棘手。此外,租约恢复缓慢(超时到期后),因此整个系统在面对故障时变得非常慢。

其次,Erlang 取消了静态类型,从而实现了热代码交换并同时运行同一代码的两个版本。这意味着您可以在运行时升级代码,而无需停止系统。这就是系统每年保持 9 个 9 或 32 毫秒停机时间的方式。它们只是就地升级。您的 C++ 函数必须手动重新链接才能升级,并且不支持同时运行两个版本。代码升级需要系统停机,如果您有一个大型集群无法同时运行多个版本的代码,则需要立即关闭整个集群。哎哟。而在电信领域,这是不能容忍的。

此外,Erlang 取消了共享内存和共享垃圾回收;每个轻量级进程都是独立的垃圾收集。这是第一点的简单扩展,但强调为了真正的容错,您需要在依赖关系方面不互锁的进程。这意味着对于大型系统来说,与 java 相比,GC 暂停是可以容忍的(小而不是暂停半小时才能完成 8GB GC)。

This is actually an excellent question, and has received excellent answers that perhaps are yet unconvincing.

To add shade and emphasis to the other great answers already here, consider what Erlang takes away (compared to traditional general purpose languages such as C/C++) in order to achieve fault-tolerance and uptime.

First, it takes away locks. Joe Armstrong's book lays out this thought experiment: suppose your process acquires a lock and then immediately crashes (a memory glitch causes the process to crash, or the power fails to part of the system). The next time a process waits for that same lock, the system has just deadlocked. This could be an obvious lock, as in the AquireScopedLock() call in the sample code; or it could be an implicit lock acquired on your behalf by a memory manager, say when calling malloc() or free().

In any case, your process crash has now halted the entire system from making progress. Fini. End of story. Your system is dead. Unless you can guarantee that every library you use in C/C++ never calls malloc and never acquires a lock, your system is not fault tolerant. Erlang systems can and do kill processes at will when under heavy load in order make progress, so at scale your Erlang processes must be killable (at any single point of execution) in order to maintain throughput.

There is a partial workaround: using leases everywhere instead of locks, but you have no guarantee that all the libraries you utilize also do this. And the logic and reasoning about correctness gets really hairy quickly. Moreover leases recover slowly (after the timeout expires), so your entire system just got really slow in the face of failure.

Second, Erlang takes away static typing, which in turn enables hot code swapping and running two versions of the same code simultaneously. This means you can upgrade your code at runtime without stopping the system. This is how systems stay up for nine 9's or 32 msec of downtime/year. They are simply upgraded in place. Your C++ functions will have to be manually re-linked in order to be upgraded, and running two versions at the same time is not supported. Code upgrades require system downtime, and if you have a large cluster that cannot run more than one version of code at once, you'll need to take the entire cluster down at once. Ouch. And in the telecom world, not tolerable.

In addition Erlang takes away shared memory and shared shared garbage collection; each light weight process is garbage collected independently. This is a simple extension of the first point, but emphasizes that for true fault tolerance you need processes that are not interlocked in terms of dependencies. It means your GC pauses compared to java are tolerable (small instead of pausing a half-hour for a 8GB GC to complete) for big systems.

放赐 2024-12-22 10:52:39

它与参与者模型的关系要少得多,而更多地是关于在 C++ 中正确编写类似于 OTP 的东西有多么困难。此外,不同的操作系统提供完全不同的调试和系统工具,并且 Erlang 的 VM 和多种语言结构支持一种统一的方式来弄清楚所有这些进程的用途,而这很难以统一的方式做到(或者可能这样做)完全)跨多个平台。 (重要的是要记住,Erlang/OTP 早于当前“演员模型”一词的流行,因此在某些情况下,此类讨论是在比较苹果和翼手龙;伟大的想法容易产生独立发明。)

所有这一切意味着,虽然你当然可以用另一种语言编写一套“演员模型”程序(我知道,在我遇到 Erlang 之前,我已经用 Python、C 和 Guile 做了很长时间了,但没有意识到这一点,包括某种形式的监视器和链接,以及前我曾经听说过“参与者模型”这个术语),了解代码实际生成的进程以及它们之间发生的情况是极其困难的。 Erlang 强制执行的规则是操作系统在没有重大内核修改的情况下根本无法执行的规则——内核修改总体上可能不会带来好处。这些规则表现为对程序员的一般限制(如果你确实需要的话,总是可以绕过的)和系统为程序员保证的基本承诺(如果你真的需要的话,可以故意打破它)。

例如,它强制两个进程不能共享状态,以保护您免受副作用的影响。这并不意味着每个函数都必须是“纯粹的”,即一切都是引用透明的(显然不是,尽管让程序尽可能多地引用透明是大多数 Erlang 的明确设计目标项目),而是两个进程不会不断地创建与共享状态或争用相关的竞争条件。 (顺便说一句,这更多的是 Erlang 上下文中“副作用”的含义;了解这一点可能会帮助您解读一些质疑 Erlang 与 Haskell 或玩具“纯”语言相比是否“真正具有功能性”的讨论.)

另一方面,Erlang 运行时保证消息的传递。在必须纯粹通过非托管端口、管道、共享内存和公共文件进行通信的环境中,这是非常令人怀念的,而操作系统内核是唯一管理这些资源的(与 Erlang 相比,操作系统内核对这些资源的管理必然是极其少的)运行时提供)。这并不意味着 Erlang 保证 RPC(无论如何,消息传递不是 RPC,也不是方法调用!),它不保证您的消息被正确寻址,而且它也不保证保证您尝试向其发送消息的进程存在或处于活动状态。如果您发送的内容恰好在当时有效,它只是保证交付。

建立在这一承诺之上的是监控和链接准确的承诺。基于此,一旦您掌握了系统发生的情况(以及如何使用 erl_connect...),Erlang 运行时就会使“网络集群”的整个概念消失。这允许您跳过一组棘手的并发案例,从而为成功案例的编码提供一个良好的开端,而不是陷入赤裸裸的并发编程所需的防御技术的沼泽中。

因此,这并不是真正需要 Erlang 这种语言,而是关于已经存在的运行时和 OTP,以相当干净的方式表达,并且用另一种语言实现任何接近它的东西都非常困难。 OTP 是一个很难遵循的行为。同样,我们也不需要真正需要 C++,我们可以坚持使用原始二进制输入,Brainfuck 并将汇编器视为我们的高级语言。我们也不需要火车或轮船,因为我们都知道如何步行和游泳。

话虽如此,VM 的字节码已有详细记录,并且已经出现了许多可编译它或与 Erlang 运行时一起使用的替代语言。如果我们将问题分解为语言/语法部分(“我必须理解 Moon Runes 才能实现并发吗?”)和平台部分(“OTP 是实现并发的最成熟方法吗?它会指导我解决最棘手的问题吗?” ,在并发、分布式环境中最常见的陷阱?”)那么答案是(“否”,“是”)。

It is a lot less about the actor model and a lot more about how hard it is to properly write something analogous to OTP in C++. Also, different operating systems provide radically different debugging and system tooling, and Erlang's VM and several language constructs support a uniform way of figuring out just what all those processes are up to which would be very hard to do in a uniform way (or maybe do at all) across several platforms. (It is important to remember that Erlang/OTP predates the current buzz over the term "actor model", so in some cases these sort of discussions are comparing apples and pterodactyls; great ideas are prone to independent invention.)

All this means that while you certainly can write an "actor model" suite of programs in another language (I know, I have done this for a long time in Python, C and Guile without realizing it before I encountered Erlang, including a form of monitors and links, and before I'd ever heard the term "actor model"), understanding how the processes your code actually spawns and what is happening amongst them is extremely difficult. Erlang enforces rules that an OS simply can't without major kernel overhauls -- kernel overhauls that would probably not be beneficial overall. These rules manifest themselves as both general restrictions on the programmer (which can always be gotten around if you really need to) and basic promises the system guarantees for the programmer (which can be deliberately broken if you really need to also).

For example, it enforces that two processes cannot share state to protect you from side effects. This does not mean that every function must be "pure" in the sense that everything is referentially transparent (obviously not, though making as much of your program referentially transparent as practical is a clear design goal of most Erlang projects), but rather that two processes aren't constantly creating race conditions related to shared state or contention. (This is more what "side effects" means in the context of Erlang, by the way; knowing that may help you decipher some of the discussion questioning whether Erlang is "really functional or not" when compared with Haskell or toy "pure" languages.)

On the other hand, the Erlang runtime guarantees delivery of messages. This is something sorely missed in an environment where you must communicate purely over unmanaged ports, pipes, shared memory and common files which the OS kernel is the only one managing (and OS kernel management of these resources is necessarily extremely minimal compared to what the Erlang runtime provides). This doesn't meant that Erlang guarantees RPC (anyway, message passing is not RPC, nor is it method invocation!), it doesn't promise that your message is addressed correctly, and it doesn't promise that a process you're trying to send a message to exists or is alive, either. It just guarantees delivery if the thing your sending to happens to be valid at that moment.

Built on this promise is the promise that monitors and links are accurate. And based on that the Erlang runtime makes the entire concept of "network cluster" sort of melt away once you grasp what is going on with the system (and how to use erl_connect...). This permits you to hop over a set of tricky concurrency cases already, which gives one a big head start on coding for the successful case instead of getting mired in the swamp of defensive techniques required for naked concurrent programming.

So its not really about needing Erlang, the language, its about the runtime and OTP already existing, being expressed in a rather clean way, and implementing anything close to it in another language being extremely hard. OTP is just a hard act to follow. In the same vein, we don't really need C++, either, we could just stick to raw binary input, Brainfuck and consider Assembler our high level language. We also don't need trains or ships, as we all know how to walk and swim.

All that said, the VM's bytecode is well documented, and a number of alternative languages have emerged that compile to it or work with the Erlang runtime. If we break the question into a language/syntax part ("Do I have to understand Moon Runes to do concurrency?") and a platform part ("Is OTP the most mature way to do concurrency, and will it guide me around the trickiest, most common pitfalls to be found in a concurrent, distributed environment?") then the answer is ("no", "yes").

怕倦 2024-12-22 10:52:39

卡萨布兰卡是演员模型领域的另一个新人。典型的异步接受如下所示:(

PID replyTo;
NameQuery request;
accept_request().then([=](std::tuple<NameQuery,PID> request)
{
   if (std::get<0>(request) == FirstName)
       std::get<1>(request).send("Niklas");
   else
       std::get<1>(request).send("Gustafsson");
}

就我个人而言,我发现 CAF 做得更好将模式匹配隐藏在一个漂亮的界面后面。)

Casablanca is another new kid on the actor model block. A typical asynchronous accept looks like this:

PID replyTo;
NameQuery request;
accept_request().then([=](std::tuple<NameQuery,PID> request)
{
   if (std::get<0>(request) == FirstName)
       std::get<1>(request).send("Niklas");
   else
       std::get<1>(request).send("Gustafsson");
}

(Personally, I find that CAF does a better job at hiding the pattern matching behind a nice interface.)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文