强的一致性和复制因子
我正在尝试提高我在分布式数据库中的知识以及可以实现的各种一致性。首先,让我定义一些我将使用的术语(请告诉我我是否错了):
强一致性:如DR报道。 Kleppmann在“设计数据密集型应用程序”中,它是“线性化性”的同义词,它的一致性级别使复制的数据存储表现得好像只有一个数据项的单个副本,并且其上的每个操作都在原子上进行。
复制因子:数据项的副本数量。
假设我有一个由强>强>模式配置的3个节点组成的群集模式和a 复制因子= 3 ,并且领导者成功地将写入写入数据项x仅适用于它的追随者之一,我有以下问题:
1。 当写作将返回在线时,该写入将被复制到第二个追随者,不是吗?
当我认为, 给它的一个追随者。的确,它已经达到了大多数法定人数,因此不必等待第二个追随者确认操作。但是,鉴于复制因子的值,将在第二个关注者在线时以相同的顺序(也是x上的)应用。
2。如果客户端应用程序试图在尚未更新的追随者上读取X,它将获得过时的值?,
由于数据库的工作正常,因此不应读取陈旧的值。因此,如果客户端无法连接到领导者或更新的追随者,则应出现错误。这应该是CAP定理的结果。
拜托,谁能告诉我我是否对,如果没有,为什么?谢谢你!
I am trying to improve my knowledge in distributed databases and the various levels of consistency that could be achieved. First, let me define some terms I will use (please, tell me if I am wrong):
strong consistency: as reported by dr. Kleppmann in "Designing Data Intensive Applications", it is a synonym for "linearizability", a consistency level which makes a replicated data store to behave as if there were only a single copy of a data item and every operation on it takes place atomically.
replication factor: the number of copies of a data item.
Supposing I have a cluster made up of 3 nodes configured in strong consistency mode and a replication factor = 3 and the leader has successfully replicated a write to a data item X only to one of its followers, I have the following questions:
1. The write will be replicated to the second follower when it will return back online, isn't it?
In my opinion, the database can return "success" to the client application when the leader has successfully replicated the write at least to one of its followers. Indeed, it has reached a majority quorum for that write, so it has not to wait for the second follower to acknowledge the operation. However, given the value of the replication factor, writes will be applied in the same order (so, also that on X) on the second follower when it will be online.
2. If a client application tries to read X on the follower not yet updated, will it obtain a stale value?
As the database is working in strong consistency, it should not be possible to read a stale value. So, if the client cannot connect to the leader or the updated follower, it should get an error. This should be a consequence of the CAP Theorem.
Please, could anyone tell me if I am right and, if not, why? Thank you!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您描述系统的方式 - 拥有领导者和成功是当大多数节点接受写作时 - 这意味着您使用基于共识的单个领导者复制。 (请询问这是否需要说明)
在基于共识的模型中,所有节点将以相同的顺序更新,但是当特定节点更新时,无法保证。但是可以保证,如果接受写作,那么大多数节点就接受了。共识协议本身可以保证所有节点最终都将获得所有内容。
我建议阅读有关筏算法的论文 - 它相对简单,涵盖了共识的所有主要方面。
上面的两个段落我说,对于给定的文字,两个陈述是真实的:大多数节点接受了文字;如果落后一些节点 - 他们最终将获得更新。这个问题的问题是否有很强的一致性?
基于共识的系统具有两种读取模式:最终一致的读取和强烈一致的读取。我看到许多论文称之为后者读物 - 可线化的读物。
最终,一致的读取很简单 - 读者转到一个随机节点,它们可能会或可能看不到最新值。一切都很好。
可线化的读取更为复杂。要理解这一点,我们应该描述基于共识的系统中的日志。日志是所有事件的序列 - 每个节点最终都会以相同的顺序具有完全相同的日志 - 相同的事件。因此,当我们说 - 写入已被接受时 - 这意味着大多数节点都将其附加到其日志中。
这是要获得强烈一致的又称可线化的算法,请阅读:
上面的步骤保证客户可以看到最新更新。
The way you described the systems - having a leader and a success is when a majority of nodes accepted a write - this implies you are using single leader replication based on consensus. (please, ask if this requires explanation)
In a consensus based model, all nodes will be updated in the same order, but there is no guarantee when a specific node get updated. But it is guaranteed that if a write is accepted, then majority of nodes accepted it. The consensus protocol itself guarantees that all nodes will get all writes eventually.
I recommend to read a paper on Raft Algorithm - it's relatively straightforward and it covers all major aspects of consensus.
Two paragraphs above I said that for a given write two statements are true: majority of nodes accepted the write; and if some nodes are behind - they will eventually get updates. The question has does this eventuality works with strong consistency?
Consensus based system has two read modes: eventual consistent reads and strongly consistent read. I saw that many papers call the latter reads - linearizable reads.
Eventually consistent reads are simple - a reader goes to a random node, and they may or may not see the latest value. All good here.
Linearizable read is more complicated. To understand that, we should describe what the log is in a consensus based system. The log is the sequence of all events - every node eventually will have exactly same log - same events in the same order. So when we say - a write has been accepted - it means that majority of nodes appended that write-event into their logs.
Here is the algorithm to get a strongly consistent aka linearizable read:
The steps from above guarantee that the client sees the latest update.