分布式哈希表中的nodeId和key之间有什么关系?
我对分布式哈希表的理解是,每个节点都可以通过nodeId唯一标识,并且可以存储信息,例如主机、端口和值。每个节点将其他节点 ID 存储在(一个或多个)查找表中,并且在系统大小为 n 的情况下,查找另一个节点的效率与 log(n) 一样。为了从节点检索值,需要一个密钥。值的键只是nodeId(即内容标识符或值的哈希值)吗?如果是的话,那么每个节点只能保存一个值?或者一个nodeId可以存储多个键值,在这种情况下,就会出现如何在不知道哪个节点包含哪些键的情况下检索值的问题。
My understanding of a distributed hash table is that every node can be identified uniquely by a nodeId and can store information, like host, port and values. Every node stores other nodeIds in (a) lookup table(s) and finding another node can be made as efficient as log(n) with system size n. In order to retrieve the value from a node, one would need a key. Is the key for a value just the nodeId (i.e. a content identifier or hash of the value)? If so, then every node can only save one value? Or can a nodeId store multiple key-values, in which case the question arises how to retrieve a value without knowing which node contains which keys.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
构建分布式哈希表有两种通用方法:基于键值或基于键的哈希将键分发到节点。这两种选择都有优点和缺点。
无论哪种方式,我们都需要找到给定键的nodeId。简单的实现可以使用模运算来查找目标节点。这种方法的问题是,当我们改变节点数量时,我们必须重新映射每个键——这很慢。
假设我们有 4 个节点,密钥为 10。10 mod 4 为 2,我们将选择节点 #2 来存储数据。
如果您确实需要有效地处理节点添加/删除,则常见的方法称为一致性散列 - 其工作原理与您所描述的类似 - 我们有多个键(或键的散列)间隔并将其中许多间隔分配给节点 - 我们要求该区间的数量远大于节点的数量。因此,如果我们添加或删除节点,则只需将其中一些间隔重新映射到其他节点。
There are two general approaches for building distributed hash tables: distribute keys to nodes based on key values or based on hashes of keys. There are pros and cons for either option.
In either way, we will need to find a nodeId for a given key. A naive implementation could use modulus operation to find a target node. The problem with this approach is that when we do change number of nodes, then we have to remap absolutely every key - this is slow.
Let's say we have 4 nodes and the key is 10. 10 mod 4 is 2 an we will pick node #2 to store the data.
If you do need to handle node addition/removal efficiently, the common approach is called Consistent Hashing - that works similar to what you have described - we have multiple key (or hashes of key) intervals and assign many of those intervals to nodes - we require number of this intervals to be much larger than number of nodes. Hence if we add or remove a node, only some of those intervals need to be remapped to other nodes.