如何理解Kademlia(KAD)协议
最近,我阅读了Kademlia协议的文档,我试图理解该协议,但我仍然有一些问题: 为什么一个节点知道自己的ID但不知道IP或端口时必须找到另一个节点? 为什么他有ID,却不知道ip和端口,他从哪里得到的ID? 我认为两个不同节点之间的“距离”不是路由距离或真实距离,它只是一个虚拟距离,可以利用算法快速找到节点,是吗?
也许我的英语不是很清楚,因为英语不是我的母语,但如果你需要的话,我会尽力表达清楚。 非常感谢!
Recently, I've read a document of the Kademlia Protocol, I tried to understand the protocol, but I still have some question:
Why a node must find another node when he knows its ID but ip or port?
Why he has the ID while he doesn't know the ip or port, where did he get the ID?
I think the "distance" between two different nodes is not a routing distance or real distance, it's only a virtual distance that can be used the algorithm to find the node quickly, it's that right?
Maybe my English is not very clear because English is not my mother tongue, but I'll try to express myself clear if you need.
Thanks very much!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
正如 cHao 所说,网络的分布式特性意味着节点需要将其 ID 和联系方式发布给与之通信的其他节点。没有将 ID 映射到联系信息的中心位置,因此每个节点必须在其自己的路由表中保留网络上节点子集的此映射。
Kademlia 路由表的结构使节点能够详细了解靠近它们的网络,并以指数方式减少更远的网络的知识。
使用按位异或作为 ID 之间名义距离的度量具有以下优点:对于给定的目标 ID,没有两个 ID 到目标的距离可以相同。
想象一个简单的例子,其中 ID 的范围为 00 到 63。如果 Kademlia 使用纯数学差异作为距离度量,则 15 和 35 与 25 的距离相同 - 两者的距离均为 10。使用 XOR, 15和25之间的距离是22,25和35之间的距离是58。
这样就可以计算出与目标ID最接近的k个ID的组明确地。
常数 k 在 Kademlia 中有多种用途,但它主要是复制因子。换句话说,一条数据存储在距离该数据的ID最近的k个节点上。
查找过程旨在返回一组 k 个节点(在每个节点上存储数据之前)或返回单个数据块(从查找迭代期间保存该数据的第一个节点开始)。
因此,纯 Kademlia 并不最适合仅查找单个节点,因此我不确定您问题的这一部分是否太相关。如果您确实想使用 Kademlia 查找单个节点,则可能值得修改查找过程,以便在任何节点返回目标节点的联系详细信息时尽早完成(就像如果目标值则查找提前完成)过程中发现)。
As cHao said, the distributed nature of the network means that nodes need to publish their IDs and their contact details to other nodes they talk to. There is no central place where IDs are mapped to contact info, so each node must keep this mapping for a subset of the nodes on the network in its own routing table.
Kademlia routing tables are structured so that nodes will have detailed knowledge of the network close to them, and exponentially decreasing knowledge further away.
The use of bitwise XOR as a measure of notional distance between IDs has the advantage that for a given target ID, no two IDs can have the same distance to the target.
Imagine a simple example where the IDs are in the range 00 to 63. If Kademlia used e.g. pure mathematical difference as a measure of distance, 15 and 35 would be the same distance to 25 - both would have a distance of 10. Using XOR, the distance between 15 and 25 is 22, and between 25 and 35 it's 58.
In this way, the group of k closest IDs to a target ID can be calculated unambiguously.
The constant k has a couple of uses in Kademlia, but it's primarily the replication factor. In other words, a piece of data is stored on the k closest nodes to the data's ID.
The lookup process is designed to return either a group of k nodes (before storing data on each of them) or return a single piece of data (from the first node holding it during the lookup iterations).
Because of this, pure Kademlia isn't best suited to finding just a single node, so I'm not sure that part of your question is too relevant. If you did want to use Kademlia to find a single node, it would probably be worth modifying the lookup process to finish early as soon as any node returns the target node's contact details (in the same way that the lookup finishes early if a target value is found during the process).
由于网络是分布式的,根据定义,不存在一张ID->地址映射的主表。节点不必(通常也不必)了解所有其他节点。 “查找”节点的过程基本上是询问与目标“最接近”的已知节点,而不是直接询问目标节点,而是询问哪些节点更接近目标。该查询的结果为您提供了要查询的下一组节点,并且重复该过程 - 因为节点会返回比实际更接近的结果,所以每次迭代都会找到越来越接近目标的节点,直到您最终到达一个可以说“哦,节点 X?他就在那儿”的节点。
至少我是这么理解的。
Since the network is distributed, by definition, there's no one master table of ID->address mappings. Nodes don't have to (and usually don't) know about all the other nodes. The process of "finding" a node is basically to ask known nodes "closest" to the target not so much about the target node directly, but about what nodes are closer to the target. The result of that query gives you the next group of nodes to query, and the process repeats -- and because a node would return results that are closer than it is, each iteration tends to find nodes closer and closer to the target til you finally reach a node that can say "Oh, node X? He's right over there."
At least that's what i'm understanding of it.