如何在 OTP 中表示多进程逻辑实体?
假设我们遇到以下问题:
- 我们有 http 客户端对我们的软件执行请求。因此,我们有一个始终可供他们使用的进程,并将他们的请求存储在队列中。
- 我们需要将这些请求分派到内部网络中的计算机(同样通过 HTTP)。
- 这样的机器并不总是可用的。它根据我们的软件的要求启动(并在队列为空时停止)(再次向“管理器”机器发出 HTTP 请求)。
- 我们有上述几个(或很多)。
所以基本上,我们有一个逻辑实体,为了便于讨论,我们将其称为“作业队列”。 每个作业队列都由多个(异构)进程组成。一种实现实际队列并且始终可用(不阻塞)的队列。管理一台工作机器。我们还有几个(按需生成)工作人员,它们从队列中取出条目,尝试将它们发送到工作人员机器,解决错误;也许将(不成功的)尝试返回到队列(要重试)等。我们可能还有一个“管理器”进程来协调上述工作。我们有很多“作业队列”,它们都由很多进程组成。
注意:这可能不是这个问题的完美解决方案,但我们假设它是。我的问题不是如何解决问题,而是如何管理代表逻辑实体的进程“组”。
那么,如何在 OTP 中表示这一点呢?您有多少个监督树,您是否在“作业队列”实体之间共享监督者,或者每个逻辑实体都有一个监督者。 另外,你如何管理整个事情。
我有一个猜测,但这是一个相当棘手的问题(因为我已经尝试以几种不同的方式实现它),所以我不会分享我的(也许不是那么糟糕)的想法(目前)。
Imagine we have the following problem:
- We have http clients that execute requests to our software. So we have one process that is always available to them and stores their requests in a queue.
- We need to dispatch these requests to a machine that is in our internal network (again via HTTP).
- Such a machine is not always available. It is started (and stopped when the queue is empty) on demand by our software (again HTTP request to a "manager" machine).
- We have several (or lots) of the above.
So basically, we have one logical entity, that for the sake of argument, we will call a "job queue".
Every job queue consists of several (heterogenous) processes. One that implements the actual queue and is always available (doesn't block). One that manages a worker machine. We also have several (spawned on demand) workers, that take entries off the queue, try to send them to the worker machine, work around errors; maybe return (unsuccessful) attempts to the queue (to be retried) etc. And we maybe also have a "manager" process that coordinates the work of the above. And we have lots of "job queues" who all consist of lots of processes.
NOTE: this may not be the perfect solution to this exact problem, but let's assume that it is. My question is not about how to solve the problem, but how to manage such "groups" of processes that represent logical entities.
So, how do you represent this in OTP? How many supervision trees do you have, do you share supervisors between "job queue" entities, or do you have a supervisor per logical entity.
Also, how do you manage the whole thing.
I have a guess, but this is quite a tricky problem (as I already tried implementing it in several different ways), so I won't share my (maybe not so bad) idea (for now).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我会为每个逻辑组件使用专门的主管(我猜你的意思是逻辑:http-workers、manager、dispatcher)。其中每个班级都有一名主管。
我喜欢它,因为我可以受益于额外的工具来控制它(计算子级,在 i() 中查看它。等等),并且它很好地分离了系统。
@MinimeDJ 提到的 Gproc 和同步/异步的东西是完全不同的东西。
我认为如果您需要在您描述的系统中使用 gproc,那么这不是最好的架构。
重新设计它以具有尽可能多的无状态层。例如,不要维护调度程序=推送模型,而是尝试拉模型=从后端机器拉任务。该解决方案使队列无状态,您可以摆脱调度程序,并且如果出现任何问题,后端层会将任务重新放入某个队列中。
此外,管理器只是简化为队列和一些统计收集器的 API。在每个异构后端模块中测量和控制后端工作人员的负载(本地!)。
I would use dedicated supervisor for each logical component (I guess you mean by logical: http-workers, manager, dispatchers). Each of those would have supervisor over one of those classes.
I like it, because I can benefit from additional tools to control it (count children, see it in i(). etc.) and it nicely separates the system.
Gproc mentioned by @MinimeDJ and sync/async stuff is completely different thing.
I think it is not the best architecture if you need in system you described to use gproc.
Redesign it to have as much as possible stateless layers. E.g. in stead of maintaining dispatchers = push model, try pull model = pull tasks from back-end machine. This solution makes queues stateless, you get rid of dispatchers and if anything goes wrong backend layer puts task again in some queue.
Moreover Managers are just reduced to API to queues and some stats collectors. Load of back-end workers is measured and controlled (localy!) in each of those heterogeneous back-end modules.
从最上面开始,我们还有一个由许多特殊块组成的系统,我们的第一个架构与您的类似。我们使用 RabbitMQ 代替 HTTP,我相信它在消息交换方面更方便。
但在最终版本发布之前,我们了解到在生产环境中维护整个系统将是一个真正的挑战。
于是,我们又重新设计了它。现在我们将每个逻辑块表示为一个进程
gen_server
。每个进程都有一个唯一的名称并存在于 gproc 中。由于 gproc 可以存在于许多节点上,因此我们有非常容易管理的容错系统。所以,我想说,我们有可管理对象模型(我们称之为 MOM,因为我们真的很喜欢它)。
所以,对我来说,你的系统似乎过于复杂。我不知道我的答案是否有用,但有时值得以您一开始从未预料到的方式思考您的系统。我希望您能找到一种简单的方法来管理它。
From very above we also have a system that consists of many special blocks and our first architecture was something similar to yours. Instead of HTTP we used RabbitMQ which I believe much more convenient in terms of messages exchange.
But before the final release we understood that it will be a real challenge to maintain the whole system in production.
So, we redesigned it again. Now we represent each logical block as a process
gen_server
. Each process has a unique name and lives in gproc. Since gproc can live on many nodes we have very easy to manage fault tolerant system.So, I would say, that we have Manageable Object Model (we call it MOM coz we really love it).
So, for me your system seems to be overcomplicated. I don't know if my answer is useful at all, but sometimes it worth to think about your system in a way you never expected at the beginning. I hope you will find a way to manage it in an easy way.