集群中参与者之间的 Akka 和状态
我正在研究我的 bc 论文项目,该项目应该是用 scala 和 Akka 编写的 Minecraft 服务器。服务器应该可以轻松部署在云中或集群上(不确定我是否使用正确的术语......它应该在多个节点上运行)。然而,我是 akka 的新手,我一直想知道如何实现这样的事情。我现在试图解决的问题是如何在不同节点上的参与者之间共享状态。我的第一个想法是让一个 Camel actor 从 Minecraft 客户端读取 tcp 流,然后将其发送到负载均衡器,负载均衡器将选择一个处理请求的节点,然后通过 tcp 向客户端发送一些响应。假设我有一个 AuthenticationService 实现参与者,它检查用户提供的凭据是否有效。每个节点都会有这样的参与者(或者可能更多),并且所有参与者应该始终具有完全相同的用户数据库(或状态)。我的问题是,保持这种状态的最佳方法是什么?我已经想出了一些我能想到的解决方案,但我还没有做过类似的事情,所以请指出错误:
解决方案#1:将状态保留在数据库中。对于这个身份验证示例来说,这可能非常有效,其中状态仅由用户名和密码列表之类的内容表示,但在状态包含无法轻松分解为整数和字符串的对象的情况下,它可能不起作用。
解决方案#2:每次向某个参与者发出会更改其状态的请求时,该参与者都会在处理请求后,向所有其他相同类型的参与者广播有关更改的信息,这些参与者将根据原演员发送的信息。这看起来效率很低而且相当笨拙。
解决方案#3:让某个节点充当某种状态节点,其中会有代表整个服务器状态的参与者。除了此类节点中的参与者之外,任何其他参与者都没有状态,并且每次需要一些数据时都会询问“状态节点”中的参与者。这看起来效率也很低,而且有点无故障。
这样你就得到了。我真正喜欢的唯一解决方案是第一个,但就像我说的,它可能只适用于非常有限的问题子集(当状态可以分解为 redis 结构时)。任何来自更有经验的专家的回应都会非常感激。 问候, 托马斯·赫尔曼
I am working on my bc thesis project which should be a Minecraft server written in scala and Akka. The server should be easily deployable in the cloud or onto a cluster (not sure whether i use proper terminology...it should run on multiple nodes). I am, however, newbie in akka and i have been wondering how to implement such a thing. The problem i'm trying to figure out right now, is how to share state among actors on different nodes. My first idea was to have an Camel actor that would read tcp stream from minecraft clients and then send it to load balancer which would select a node that would process the request and then send some response to the client via tcp. Lets say i have an AuthenticationService implementing actor that checks whether the credentials provided by user are valid. Every node would have such actor(or perhaps more of them) and all the actors should have exactly same database (or state) of users all the time. My question is, what is the best approach to keep this state? I have came up with some solutions i could think of, but i haven't done anything like this so please point out the faults:
Solution #1: Keep state in a database. This would probably work very well for this authentication example where state is only represented by something like list of username and passwords but it probably wouldn't work in cases where state contains objects that can't be easily broken into integers and strings.
Solution #2: Every time there would be a request to a certain actor that would change it's state, the actor will, after processing the request, broadcast information about the change to all other actors of the same type whom would change their state according to the info send by the original actor. This seems very inefficient and rather clumsy.
Solution #3: Having a certain node serve as sort of a state node, in which there would be actors that represent the state of the entire server. Any other actor, except the actors in such node would have no state and would ask actors in the "state node" everytime they would need some data. This seems also inefficient and kinda fault-nonproof.
So there you have it. Only solution i actually like is the first one, but like i said, it probably works in only very limited subset of problems (when state can be broken into redis structures). Any response from more experienced gurus would be very appriciated.
Regards, Tomas Herman
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
解决方案#1 可能会很慢。此外,它还是一个瓶颈和单点故障(意味着如果具有数据库的节点发生故障,应用程序将停止工作)。解决方案#3 也有类似的问题。
解决方案#2 并不像看起来那么简单。首先,它是单点故障。其次,读取或写入没有原子性或其他顺序保证(例如规律性),除非您执行 全序广播(比常规广播更昂贵)。事实上,大多数分布式寄存器算法都会在幕后进行广播,因此,虽然效率低下,但可能是必要的。
根据您的描述,您的分布式寄存器需要原子性。原子性是什么意思?原子性意味着并发读取和写入序列中的任何读取或写入看起来就像发生在单个时间点一样。
非正式,在解决方案 #2 中,单个 actor 持有一个寄存器,这保证了如果发生 2 个后续写入 W1 和 W2 到寄存器的情况(意味着 2 次广播),则没有其他 actor 读取这些值从寄存器中读取它们的顺序与先 W1 然后 W2 不同(实际上比这更复杂)。如果您查看几个消息在不同时间点到达目的地的后续广播示例,您会发现根本无法保证这样的排序属性。
如果排序保证或原子性不是问题,某种基于八卦的算法可能会慢慢地将更改传播到所有节点。这对于您的示例可能没有多大帮助。
如果您想要完全容错和原子,我建议您阅读这本书书 Rachid Guerraoui 和 Luís Rodrigues 的可靠分布式编程,或与分布式寄存器抽象相关的部分。这些算法构建在消息传递通信层之上,并维护支持读写操作的分布式寄存器。您可以使用这样的算法来存储分布式状态信息。但是,它们不适用于数千个节点或大型集群,因为它们无法扩展,通常具有节点数量的复杂性多项式。
另一方面,您可能不需要在所有节点之间复制分布式寄存器的状态 - 将其复制到节点的子集(而不是仅一个节点)并访问这些节点以从中读取或写入,提供一定程度的容错能力(只有当整个节点子集出现故障时,寄存器信息才会丢失)。您可以调整书中的算法来实现此目的。
Solution #1 could possibly be slow. Also, it is a bottleneck and a single point of failure (meaning the application stops working if the node with the database fails). Solution #3 has similar problems.
Solution #2 is less trivial than it seems. First, it is a single point of failure. Second, there are no atomicity or other ordering guarantees (such as regularity) for reads or writes, unless you do a total order broadcast (which is more expensive than a regular broadcast). In fact, most distributed register algorithms will do broadcasts under-the-hood, so, while inefficient, it may be necessary.
From what you've described, you need atomicity for your distributed register. What do I mean by atomicity? Atomicity means that any read or write in a sequence of concurrent reads and writes appears as if it occurs in single point in time.
Informally, in the Solution #2 with a single actor holding a register, this guarantees that if 2 subsequent writes W1 and then W2 to the register occur (meaning 2 broadcasts), then no other actor reading the values from the register will read them in the order different than first W1 and then W2 (it's actually more involved than that). If you go through a couple of examples of subsequent broadcasts where messages arrive to destination at different points in time, you will see that such an ordering property isn't guaranteed at all.
If ordering guarantees or atomicity aren't an issue, some sort of a gossip-based algorithm might do the trick to slowly propagate changes to all the nodes. This probably wouldn't be very helpful in your example.
If you want fully fault-tolerant and atomic, I recommend you to read this book on reliable distributed programming by Rachid Guerraoui and Luís Rodrigues, or the parts related to distributed register abstractions. These algorithms are built on top of a message passing communication layer and maintain a distributed register supporting read and write operations. You can use such an algorithm to store distributed state information. However, they aren't applicable to thousands of nodes or large clusters because they do not scale, typically having complexity polynomial in the number of nodes.
On the other hand, you may not need to have the state of the distributed register replicated across all of the nodes - replicating it across a subset of your nodes (instead of just one node) and accessing those to read or write from it, providing a certain level of fault-tolerance (only if the entire subset of nodes fails, will the register information be lost). You can possibly adapt the algorithms in the book to serve this purpose.