Apache ZooKeeper:写入如何工作

发布于 2024-10-25 10:08:06 字数 453 浏览 10 评论 0原文

Apache ZooKeeper 是一种针对小对象的高可用数据存储。 ZooKeeper 集群由一些节点组成,这些节点都将整个数据集保存在内存中。该数据集被称为“始终一致”,因此每个节点每次都有相同的数据。

根据文档和博客文章,集群中的每个节点都可以应答读取并接受写入。

  • 读取始终由节点在本地应答,因此不涉及与集群的通信。
  • 写入被转发到指定的“领导者”节点,该节点将写入请求转发到所有节点并等待它们的回复。如果至少一半节点应答,则认为写入成功。

问题:为什么领导者需要等待一半节点回复就足够了?如果有人连接到未收到更新的节点之一,他会得到过时的结果(仅本地读取本地值)。

Apache ZooKeeper is a kind of high available data-store for small objects. A ZooKeeper cluster consists of some nodes which all keep the whole dataset in their memory. The dataset is called "always-consistent", so every node has the same data at every time.

According to the documentation and blog posts, every node in the cluster can answer reads and accept writes.

  • Reads are always answered locally by the node, so no communication with the cluster is involved.
  • Writes are forwarded to a designated "Leader" node, which forwards the write-request to all nodes and waits for their replies. If at least half of the nodes answers, the write is considered successful.

Question: Why is it enough for the leader to wait for half of the nodes to reply? If somebody connects to one of the nodes which didn't receive the update, he gets an outdated result (only local read to local value).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

断桥再见 2024-11-01 10:08:06

为了实现高读可用性,Zookeeper保证复制的弱一致性:读总是可以由客户端节点应答,并且返回的答案可能是过时的值(即使已经通过领导者提交了新版本) )。

然后,用户有责任决定读取的答案是否“可陈旧”,因为并非所有应用程序都需要最新信息。因此提供以下选择:

1) 如果您的应用程序不需要读取最新值,您可以通过直接向客户端请求数据来获得高读取可用性。

2) 如果您的应用程序需要读取最新值,则应在读取请求之前使用“sync”API 将客户端版本与领导者同步。

所以总而言之,Zookeeper提供了可定制的一致性保证,用户可以决定可用性和一致性之间的平衡。

如果你想了解更多关于Zookeeper的内部原理,我推荐这篇论文:ZooKeeper:互联网规模系统的无等待协调。上述策略在 4.4 节中描述。

In order to achieve high read-availability, Zookeeper guarantees a weak-consistency over the replicates: a read can always be answered by a client node, and the answer returned may be a stale value (even a new version has been committed through the leader).

Then it is the users' responsibility to decide whether the answer for a read is "stale-able" or not, since not all applications require the up-to-date information. So the following choices are provided:

1) If your application does not need up-to-date values for reads, you can get high read-availability by requesting data directly from the client.

2) If your application requires up-to-date values for reads, you should use the "sync" API before your read request to sync the client-side version with the leader.

So as a conclusion, Zookeeper provides a customizable consistency guarantee, and users can decide the balance between availability and consistency.

If you want to know more about the internals of Zookeeper, I recommend this paper: ZooKeeper: Wait-free coordination for Internet-scale systems. The above strategy is described in Section 4.4.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文