一致性哈希:重新哈希怎么样?
如您所知,在处理 DHT 时,一致性哈希是一个好主意。主要思想是在添加或删除新节点时不要遭受太大影响。
来自原始论文:
添加或删除机器时 从缓存组中,预期 必须移动的物体的比例 到一个新的缓存是最少需要的 以维持负载平衡 缓存。
解决方案很棒,但是存在密钥分配不好的现象。为了解决这个问题,原始节点的副本是随机分布的。该解决方案效果很好。如果您想确定的话,请查看此图表。
好的,看起来效果很好。但是,有件事我一直在想,但没有人提到。
添加(或删除)一个节点时会发生什么?好吧,放置的节点“之前”的每个键都需要重新散列。这看起来不错,因为这些键不会是“全部”键。但是,如果我们决定放置一些副本,比如 20 个,那么 20 个节点将感受到重新哈希的痛苦。
更少的副本意味着更差的分布,但是更多的副本意味着需要重新哈希时更痛苦。
您知道什么解决方案适合这种情况?我错过了什么吗?
As you may know, consistent hashing is a great idea when dealing with DHT. The main idea is to not suffer too much when a new node is added or deleted.
From the original Paper:
When a machine is added to or removed
from the set of caches, the expected
fraction of objects that must be moved
to a new cache is the minimum needed
to maintain a balanced load across the
caches.
The solution is great, but there is a phenomenon of bad distribution of the keys. To solve that, replicas of the original nodes are distributed randombly. That solution works quite well. Look at this chart if you want to be sure.
Ok, seems to work well. But, there is something i've been thinking that nobody mention.
What happens when one node is added (or removed)? Well, every key, "before" the node that is placed needs to be rehashed. That seems good, becouse those keys will not be "all" the keys. But, if we decide to place some replicas, say 20, then, 20 nodes will feel the pain of rehashing.
Less replicas means worse distribution, but more replicas means more pain when rehashing is needed.
What solution do you know would suit in this situation? Am I missing something?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
是的,我认为你以错误的方式看待它。我将使用术语“节点”来表示机器,使用“对象”来表示要缓存的事物。
平均而言,几乎每个节点都会受到新添加的影响。这还不错;它将重新散列的负载分散到所有可用节点上。
重要的是大多数对象不会受到重新哈希的影响。平均而言,只有 1/nodes 对象需要重新分配;平均而言,每个节点只需要处理 1/nodes^2 个节点的转移,这确实减少了这种影响。
Yeah, I think you're looking at it the wrong way. I'll use the terms "node" for a machine and "object" for a thing to be cached.
On average, almost every node will be affected by a new add being added. This is not bad; it spreads the load of rehashing across all available nodes.
The important part is that most objects are not affected by the rehashing. On average, only 1/nodes objects will need to be reassigned; and on average, each node will only need to cope with transferring-away 1/nodes^2 nodes, which really cuts down on the impact of this.
看起来您正在尝试通过增加副本数量来解决分配问题,而“更好”的哈希函数就可以解决问题。好的哈希函数确实可以提供好的分布(参见 MD5、SHA 等)。
It looks like you are trying to solve a distribution issue by increasing the number of replicas, when a 'better' hashing function would do the trick. Good hash functions do provide good distributions (see MD5, SHA, etc...).