一致散列作为扩展写入的一种方式

发布于 2024-11-01 01:51:16 字数 222 浏览 0 评论 0原文

我试图弄清楚我是否走在正确的道路上。我正在构建一个(实时)统计/分析服务,并使用 redis 来存储一些集合和哈希值。

现在假设我取得了一些成功并且需要扩展。哈希环技术看起来不错,但我的印象是它只适合缓存场景。

如果一个节点宕机了怎么办?理论上,它的密钥现在由其他节点拥有。实际上,他们不会拥有数据。丢了是吗?与添加/删除节点相同。

我是否错过了一些基本的东西?这会是穷人的集群吗?

I am trying to figure out if I am on the right track. I am building a (real-time) statistics/analytics service and I use redis to store some sets and hashes.

Now let's assume I have some success and I need to scale out. The hash ring technique looks nice, but I have an impression that it is only suited for caching scenarios.

What if a node goes down? In theory, its keys are now owned by other nodes. In practice, they won't have the data. It is lost, right? Same with adding / removing nodes.

Am I missing some fundamental thing? Can this be a poor man's cluster?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

猫腻 2024-11-08 01:51:16

在集群中使用多个节点有两个原因:

  • 分片以限制每个节点上存储的数据量
  • 复制以减少读取负载并允许删除节点而不丢失数据。

两者本质上是不同的,但您可以同时实现两者 - 使用一致的哈希来指向具有标准主/从设置的一组节点,而不是单个节点。

如果集群是您的主数据存储而不是缓存,您将需要不同的重新分配策略,其中包括复制数据。

我的实现基于让客户端选择 64k 存储桶之一作为哈希,并拥有一个将该存储桶映射到节点的表。最初,所有映射到节点#1。

当节点 #1 变得太大时,其从节点将成为主节点 #2,并且更新表以将节点 #1 的一半键映射到节点 #2。此时,所有读取和写入都将使用新映射,您只需清理现在位于错误节点上的键即可。根据性能要求,您可以一次检查所有密钥,也可以像过期系统那样检查随机选择的密钥。

There are two reasons to use multiple nodes in a cluster:

  • Sharding to limit the amount of data stored on each node
  • Duplication to reduce read load and allow a node to be removed without data loss.

The two are fundamentally different, but you can implement both - use consistent hashing to point to a set of nodes with a standard master/slave setup rather than a single node.

If the cluster is your primary data store rather than a cache, you will need a different redistribution strategy that includes copying the data.

My implementation is based on having the client choose one of 64k buckets for a hash and having a table that maps that bucket to a node. Initially, all map to node #1.

When node #1 gets too large, its slave becomes master node #2 and the table is updated to map half of the node #1 keys to node #2. At this point all reads and writes will work with the new mapping and you just need to clean up the keys that are now on the wrong node. Depending on the performance requirements, you can check all keys at once or check a random selection of keys as the expiry system does.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文