当前位置：文江博客话题详情

如何设计基于 Erlang/OTP 的分布式容错多核系统的架构？

发布于 2024-12-02 21:18:41 字数 485 浏览 0 评论 0原文

我想构建一个基于 Erlang/OTP 的系统来解决“令人尴尬的并行”问题。

我已经阅读/浏览过：

Learn You Some Erlang；
Erlang 编程（阿姆斯特朗）；
Erlang 编程（Cesarini）；
Erlang/OTP 的实际应用。

我已经了解了进程、消息传递、主管、gen_servers、日志记录等的要点。

我确实了解某些架构选择取决于所关注的应用程序，但我仍然想知道 ERlang/OTP 系统设计的一些一般原则。

我应该从一些带有主管的 gen_servers 开始，然后在此基础上逐步构建吗？

我应该有多少名主管？我如何决定系统的哪些部分应该基于流程？我应该如何避免瓶颈？

我应该稍后添加日志记录吗？

Erlang/OTP 分布式容错多处理器系统架构的一般方法是什么？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

执笔绘流年 2024-12-09 21:18:41

我应该从一些带有主管的 gen_servers 开始，然后在此基础上逐步构建吗？

您在这里缺少 Erlang 架构中的一个关键组件：应用程序！（即OTP应用程序的概念，而不是软件应用程序）。

将应用程序视为组件。系统中的组件解决特定问题，负责一组连贯的资源或从系统中抽象出一些重要或复杂的东西。

设计 Erlang 系统的第一步是决定需要哪些应用程序。有些可以直接从网络上获取，我们可以将它们称为库。其他的你需要自己编写（否则你就不需要这个特定的系统）。我们通常将这些应用程序称为业务逻辑（通常您还需要自己编写一些库，但保持库和将所有内容联系在一起的核心业务应用程序之间的区别很有用）。

我应该有多少名主管？

您应该为您想要监控的每种流程指定一名主管。

一群一模一样的临时工？一位主管统治他们所有人。

不同的进程具有不同的职责和重启策略？每种不同类型流程的主管，在正确的层次结构中（取决于何时应该重新启动以及哪些其他流程需要随之下降？）。

有时，将一堆不同的流程类型放在同一个主管之下是可以的。当您有几个始终运行的单例进程（例如，一个 HTTP 服务器管理程序、一个 ETS 表所有者进程、一个统计收集器）时，通常会出现这种情况。在这种情况下，为每个主管配备一名主管可能会过于繁琐，因此通常会添加一名主管。执行此操作时，请注意使用特定重新启动策略的影响，这样您就不会停止统计进程，例如，万一您的 Web 服务器崩溃（one_for_one 是最常见的策略）在这样的情况下使用）。请注意 one_for_one 管理程序中的进程之间不要有任何依赖关系。如果一个进程依赖于另一个崩溃的进程，它也可能崩溃，从而过于频繁地触发主管的重新启动强度，并使主管本身过早崩溃。可以通过使用两个不同的主管来避免这种情况，这将通过配置的强度和周期完全控制重新启动（更长的解释）。

我如何决定系统的哪些部分应该基于流程？

系统中的每个并发活动都应该位于其自己的进程中。对并发的错误抽象是 Erlang 系统设计者一开始最常犯的错误。

有些人不习惯处理并发；有些人不习惯处理并发。他们的系统往往包含的信息太少。一个进程或几个巨大的进程按顺序运行所有内容。这些系统通常充满了代码异味，代码非常死板，难以重构。它还使它们变慢，因为它们可能不会使用 Erlang 可用的所有内核。

其他人立即掌握了并发概念，但未能最佳地应用它们；他们的系统倾向于过度使用进程概念，使许多进程保持空闲状态，等待其他正在工作的进程。这些系统往往过于复杂且难以调试。

本质上，在这两种变体中，您都会遇到相同的问题，您没有使用可用的所有并发性，也没有从系统中获得最大性能。

如果您坚持单一责任原则并遵守为系统中的每个真正并发活动建立一个流程的规则，应该没问题。

存在空闲进程是有充分理由的。有时它们保留重要的状态，有时您想暂时保留一些数据并稍后丢弃该进程，有时它们等待外部事件。更大的陷阱是通过一长串基本上不活动的进程传递重要消息，因为它会通过大量复制减慢系统速度并使用更多内存。

我应该如何避免瓶颈？

很难说，很大程度上取决于您的系统及其正在做什么。但一般来说，如果您在应用程序之间有良好的责任划分，您应该能够将看似成为瓶颈的应用程序与系统的其余部分分开进行扩展。

这里的黄金法则是测量、测量、测量！在进行测量之前，不要认为自己有需要改进的地方。

Erlang 的伟大之处在于它允许您将并发隐藏在接口后面（称为隐式并发）。例如，您使用功能模块 API，即普通的 module:function(Arguments) 接口，它可以反过来生成数千个进程，而调用者无需知道这一点。如果您的抽象和 API 正确，那么在开始使用库后，您始终可以对其进行并行化或优化。

话虽如此，这里有一些一般准则：

尝试直接向收件人发送消息，避免通过中间进程传递或路由消息。否则系统只是花时间移动消息（数据）而没有真正工作。
不要过度使用 OTP 设计模式，例如 gen_servers。在许多情况下，您只需要启动一个进程，运行一段代码，然后退出。为此，gen_server 就显得有些过分了。

还有一个额外的建议：不要重复使用流程。在 Erlang 中生成进程是如此便宜和快速，以至于一旦进程的生命周期结束就重新使用它就没有意义了。在某些情况下，重用状态（例如文件的复杂解析）可能是有意义的，但最好将其规范地存储在其他地方（在 ETS 表、数据库等中）。

我应该稍后添加日志记录吗？

您现在应该添加日志记录！有一个很棒的内置 API，名为 Logger，它随 Erlang/OTP 版本一起提供21：

logger:error("The file does not exist: ~ts",[Filename]),
logger:notice("Something strange happened!"),
logger:debug(#{got => connection_request, id => Id, state => State},
             #{report_cb => fun(R) -> {"~p",[R]} end}),

这个新的 API 具有多个高级功能，应该涵盖大多数需要日志记录的情况。还有较旧但仍广泛使用的第 3 方库 Lager。

Erlang/OTP 分布式容错多处理器系统架构的一般方法是什么？

总结上面所说的内容：

将您的系统划分为应用程序
将您的流程置于正确的监督层次结构中，具体取决于它们的需求和依赖性
为系统中的每个真正并发的活动建立一个流程
维护针对系统中其他组件的功能 API。这可以让您：
- 重构您的代码而不更改使用它的代码
- 事后优化代码
- 在需要时分发您的系统（只需调用 API 后面的另一个节点！调用者不会注意到！）
- 更轻松地测试代码（设置测试工具的工作更少，更容易理解如何使用它）
开始使用 OTP 中可用的库，直到您需要不同的东西（时机成熟时您就会知道）

常见陷阱：

进程太多
进程太少路由
太多（转发消息，链接进程）
太少的应用程序（实际上我从未见过相反的情况）
没有足够的抽象（使得重构和推理变得困难。这也使得测试变得困难！）

Should I just start with a few gen_servers with a supervisor and incrementally build on that?

You're missing one key component in Erlang architectures here: applications! (That is, the concept of OTP applications, not software applications).

Think of applications as components. A component in your system solves a particular problem, is responsible for a coherent set of resources or abstract something important or complex from the system.

The first step when designing an Erlang system is to decide which applications are needed. Some can be pulled from the web as they are, these we can refer to as libraries. Others you'll need to write yourself (otherwise you wouldn't need this particular system). These applications we usually refer to as the business logic (often you need to write some libraries yourself as well, but it is useful to keep the distinction between the libraries and the core business applications that tie everything together).

How many supervisors should I have?

You should have one supervisor for each kind of process you want to monitor.

A bunch of identical temporary workers? One supervisor to rule them all.

Different process with different responsibilities and restart strategies? A supervisor for each different type of process, in a correct hierarchy (depending on when things should restart and what other process needs to go down with them?).

Sometimes it is okay to put a bunch of different process types under the same supervisor. This is usually the case when you have a few singleton processes (e.g. one HTTP server supervisor, one ETS table owner process, one statistics collector) that will always run. In that case, it might be too much cruft to have one supervisor for each, so it is common to add the under one supervisor. Just be aware of the implications of using a particular restart strategy when doing this, so you don't take down your statistics process for example, in case your web server crashes (one_for_one is the most common strategy to use in cases like this). Be careful not to have any dependencies between processes in a one_for_one supervisor. If a process depends on another crashed process, it can crash as well, triggering the supervisors' restart intensity too often and crash the supervisor itself too soon. This can be avoided by having two different supervisors, which would completely control the restarts by the configured intensity and period (longer explanation).

How do I decide which parts of the system should be process-based?

Every concurrent activity in your system should be in it's own process. Having the wrong abstraction of concurrency is the most common mistake by Erlang system designers in the beginning.

Some people are not used to deal with concurrency; their systems tend to have too little of it. One process, or a few gigantic ones, that runs everything in sequence. These systems are usually full of code smell and the code is very rigid and hard to refactor. It also makes them slower, because they may not use all the cores available to Erlang.

Other people immediately grasp the concurrency concepts but fail to apply them optimally; their systems tend to overuse the process concept, making many process stay idle waiting for others that are doing work. These systems tend to be unnecessarily complex and hard to debug.

In essence, in both variants you get the same problem, you don't use all the concurrency available to you and you don't get the maximum performance out of the system.

If you stick to the single responsibility principle and abide by the rule to have a process for every truly concurrent activity in your system, you should be okay.

There are valid reasons to have idle processes. Sometimes they keep important state, sometimes you want to keep some data temporarily and later discard the process, sometimes they wait on external events. The bigger pitfall is to pass important messages through a long chain of largely inactive processes, as it will slow down your system with lots of copying and use more memory.

How should I avoid bottlenecks?

Hard to say, depends very much on your system and what it's doing. Generally though, if you have a good division of responsibility between applications you should be able to scale the application that appears to be the bottleneck separately from the rest of the system.

The golden rule here is to measure, measure, measure! Don't think you have something to improve until you've measured.

Erlang is great in that it allows you to hide concurrency behind interfaces (known as implicit concurrency). For example, you use a functional module API, a normal module:function(Arguments) interface, that could in turn spawn thousands of processes without the caller having to know that. If you got your abstractions and your API right, you can always parallelize or optimize a library after you've started using it.

That being said, here are some general guide lines:

Try to send messages to the recipient directly, avoid channeling or routing messages through intermediary processes. Otherwise the system just spends time moving messages (data) around without really working.
Don't overuse the OTP design patterns, such as gen_servers. In many cases, you only need to start a process, run some piece of code, and then exit. For this, a gen_server is overkill.

And one bonus advice: don't reuse processes. Spawning a process in Erlang is so cheap and quick that it doesn't make sense to re-use a process once its lifetime is over. In some cases it might make sense to re-use state (e.g. complex parsing of a file) but that is better canonically stored somewhere else (in an ETS table, database etc.).

Should I add logging later?

You should add logging now! There's a great built-in API called Logger that comes with Erlang/OTP from version 21:

logger:error("The file does not exist: ~ts",[Filename]),
logger:notice("Something strange happened!"),
logger:debug(#{got => connection_request, id => Id, state => State},
             #{report_cb => fun(R) -> {"~p",[R]} end}),

This new API has several advanced features and should cover most cases where you need logging. There's also the older but still widely used 3rd party library Lager.

What is the general approach to Erlang/OTP distributed fault-tolerant multiprocessors systems architecture?

To summarize what's been said above:

Divide your system into applications
Put your processes in the correct supervision hierarchy, depending on their needs and dependencies
Have a process for every truly concurrent activity in your system
Maintain a functional API towards the other components in the system. This lets you:
- Refactor your code without changing the code that's using it
- Optimize code afterwards
- Distribute your system when needed (just make a call to another node behind the API! The caller won't notice!)
- Test the code more easily (less work setting up test harnesses, easier to understand how to use it)
Start using the libraries available to you in OTP until you need something different (you'll know, when the time comes)

Common pitfalls:

Too many processes
Too few processes
Too much routing (forwarded messages, chained processes)
Too few applications (I've never seen the opposite case, actually)
Not enough abstraction (makes it hard to refactor and reason about. It also makes it hard to test!)

回复收藏 0 原文

~没有更多了~

关于作者

幸福％小乖

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

如何设计基于 Erlang/OTP 的分布式容错多核系统的架构？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

我应该从一些带有主管的 gen_servers 开始，然后在此基础上逐步构建吗？

我应该有多少名主管？

我如何决定系统的哪些部分应该基于流程？

我应该如何避免瓶颈？

我应该稍后添加日志记录吗？

Erlang/OTP 分布式容错多处理器系统架构的一般方法是什么？

Should I just start with a few gen_servers with a supervisor and incrementally build on that?

How many supervisors should I have?

How do I decide which parts of the system should be process-based?

How should I avoid bottlenecks?

Should I add logging later?

What is the general approach to Erlang/OTP distributed fault-tolerant multiprocessors systems architecture?

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

如何设计基于 Erlang/OTP 的分布式容错多核系统的架构？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

我应该从一些带有主管的 gen_servers 开始，然后在此基础上逐步构建吗？

我应该有多少名主管？

我如何决定系统的哪些部分应该基于流程？

我应该如何避免瓶颈？

我应该稍后添加日志记录吗？

Erlang/OTP 分布式容错多处理器系统架构的一般方法是什么？

Should I just start with a few gen_servers with a supervisor and incrementally build on that?

How many supervisors should I have?

How do I decide which parts of the system should be process-based?

How should I avoid bottlenecks?

Should I add logging later?

What is the general approach to Erlang/OTP distributed fault-tolerant multiprocessors systems architecture?

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。