BitTorrent 磁力链接如何工作?
我第一次使用磁力链接。我很好奇它是如何工作的,因此查阅了规格,但没有找到任何答案。 wiki 表示 xt
表示“精确主题”,后跟带有 SHA1 哈希值的格式(在本例中为 btih
)。我看到提到了base32,知道它每个字符5位,32个字符,我发现它正好容纳160位,这正是SHA1的大小。
没有空间容纳 IP 地址或其他任何东西,它只是一个 SHA1。那么 BitTorrent 客户端如何找到实际的文件呢?我打开 URL Snooper 来查看它是否访问页面(使用 TCP)或进行查找等,但什么也没发生。我不知道客户是如何找到同行的。这是如何运作的?
另外,哈希值是什么?它是所有文件哈希值一起组成的数组的哈希值吗?也许它是所需的实际 torrent 文件的哈希值(剥离某些信息)?
在虚拟机中,我尝试了 uTorrent(新安装的)的磁力链接,它成功找到了对等点。第一个同伴从哪里来?它是新鲜的,没有其他种子。
For the first time I used a magnet link. Curious about how it works, I looked up the specs and didn't find any answers. The wiki says xt
means "exact topic" and is followed by the format (btih
in this case) with a SHA1 hash. I saw base32 mentioned, knowing it's 5 bits per character and 32 characters, I found it holds exactly 160bits, which is exactly the size of the SHA1.
There's no room for an IP address or anything, it's just a SHA1. So how does the BitTorrent client find the actual file? I turned on URL Snooper to see if it visits a page (using TCP) or does a lookup or the like, but nothing happened. I have no idea how the client finds peers. How does this work?
Also, what is the hash of? Is it a hash of an array of all the file hashes together? Maybe it's a hash of the actual torrent file required (stripping certain information)?
In a VM, I tried a magnet link with uTorrent (which was freshly installed) and it managed to find peers. Where did the first peer come from? It was fresh and there were no other torrents.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
BitTorrent 磁力链接使用1 SHA-1 或截断的 SHA-256 哈希值(称为“infohash”)来标识 torrent。这与对等点(客户端)在与跟踪器或其他对等点通信时用于识别 torrent 的值相同。传统的 .torrent 文件包含一个带有两个顶级键的数据结构:
announce
(标识用于下载的跟踪器)和info
(包含文件名)以及 torrent 的哈希值。 “infohash”是编码的info
数据的哈希值。一些磁力链接包含跟踪器或网络种子,但它们通常不包含。除了其信息哈希之外,您的客户可能对 torrent 一无所知。它需要做的第一件事是找到正在下载 torrent 的其他节点。它使用运行“分布式哈希表”(DHT) 的单独对等网络2来实现此目的。 DHT 是一个大型分布式索引,它将 torrent(由 infohashes 标识)映射到参与该 torrent 群(上传/下载数据或元数据)的对等点列表(由 IP 地址和端口标识)。
客户端第一次加入 DHT 网络时,它会从与 infohashes 相同的空间生成一个随机 160 位 ID。然后,它使用客户端开发人员控制的客户端硬编码地址或之前在 torrent 群中遇到的支持 DHT 的客户端来引导其与 DHT 网络的连接。当它想要参与给定 torrent 的集群时,它会在 DHT 网络中搜索 ID 尽可能接近 infohash3 的其他客户端。它通知这些客户端它想要加入群,并要求他们提供他们已经知道的参与群的任何对等点的连接信息。
当对等点上传/下载特定 torrent 时,他们会尝试告诉对方他们所知道的正在参与同一 torrent 群的所有其他对等点。这让对等点可以快速了解彼此,而无需让跟踪器或 DHT 受到持续的请求。一旦您从 DHT 中了解了一些对等点,您的客户端将能够向这些对等点询问 torrent 群中更多对等点的连接信息,直到您拥有所需的所有对等点。
最后,我们可以向这些对等点询问 torrent 的
info
元数据,其中包含文件名和哈希列表。一旦我们下载了这些信息并使用已知的infohash
验证其正确性,我们实际上就处于与使用常规.torrent
文件启动的客户端相同的位置并从附带的跟踪器中获取了对等点列表。下载可能会开始。
1 infohash 通常是十六进制编码的,但一些旧客户端使用 Base 32 代替。 v1 (
urn:btih:
) 直接使用 SHA-1 摘要,而 v2 (urn:btmh:
) 添加了 multihash< /a> 前缀,用于标识哈希算法和摘要长度。2 有两种主要的 DHT 网络:更简单的“主线”DHT 和 Azureus/Vuze/BiglyBT 使用的更复杂的协议。
3距离通过异或来测量。
进一步阅读
A BitTorrent magnet link identifies a torrent using1 a SHA-1 or truncated SHA-256 hash value known as the "infohash". This is the same value that peers (clients) use to identify torrents when communicating with trackers or other peers. A traditional .torrent file contains a data structure with two top-level keys:
announce
, identifying the tracker(s) to use for the download, andinfo
, containing the filenames and hashes for the torrent. The "infohash" is the hash of the encodedinfo
data.Some magnet links include trackers or web seeds, but they often don't. Your client may know nothing about the torrent except for its infohash. The first thing it needs to is find other peers who are downloading the torrent. It does this using a separate peer-to-peer network2 operating a "distributed hash table" (DHT). A DHT is a big distributed index which maps torrents (identified by infohashes) to lists of peers (identified by IP address and ports) who are participating in a swarm for that torrent (uploading/downloading data or metadata).
The first time a client joins the DHT network it generates a random 160-bit ID from the same space as infohashes. It then bootstraps its connection to the DHT network using either hard-coded addresses of clients controlled by the client developer, or DHT-supporting clients previously encountered in a torrent swarm. When it wants to participate in a swarm for a given torrent, it searches the DHT network for several other clients whose IDs are as close3 as possible to the infohash. It notifies these clients that it would like to participate in the swarm, and asks them for the connection information of any peers they already know of who are participating in the swarm.
When peers are uploading/downloading a particular torrent, they try to tell each other about all of the other peers they know of that are participating in the same torrent swarm. This lets peers know of each other quickly, without subjecting a tracker or DHT to constant requests. Once you've learned of a few peers from the DHT, your client will be able to ask those peers for the connection information of yet more peers in the torrent swarm, until you have all of the peers you need.
Finally, we can ask these peers for the torrent's
info
metadata, containing the filenames and hash list. Once we've downloaded this information and verified that it's correct using the knowninfohash
, we're in practically the same position as a client that started with a regular.torrent
file and got a list of peers from the included tracker.The download may begin.
1 The infohash is typically hex-encoded, but some old clients used base 32 instead. v1 (
urn:btih:
) uses the SHA-1 digest directly, while v2 (urn:btmh:
) adds a multihash prefix to identify the hash algorithm and digest length.2 There are two primary DHT networks: the simpler "mainline" DHT, and a more complicated protocol used by Azureus/Vuze/BiglyBT.
3 The distance is measured by XOR.
Further Reading
对等发现和资源发现(在您的情况下是文件)是两个不同的东西。
我更熟悉 JXTA,但所有对等网络都遵循相同的基本原理。
需要发生的第一件事是同行发现。
对等点发现
大多数 p2p 网络都是“种子”网络:首次启动时,对等点将连接到众所周知的(硬编码)地址以检索正在运行的对等点列表。它可以是直接播种,例如连接到另一篇文章中提到的 dht.transmissionbt.com,也可以是间接播种,如通常使用 JXTA 完成的,其中对等点连接到仅提供其他对等点的纯文本列表的地址网络地址。
一旦与第一个(少数)对等点建立连接,连接对等点就会执行其他对等点的发现(通过发送请求)并维护它们的表。由于其他对等点的数量可能很大,因此连接对等点仅维护对等点的分布式哈希表(DHT)的一部分。确定连接对等方应维护表的哪一部分的算法因网络而异。 BitTorrent 使用具有 160 位标识符/密钥的 Kademlia。
资源发现
一旦连接对等点发现了一些对等点,后者就会向它们发送一些发现资源的请求。磁力链接标识这些资源,并以这样的方式构建:它们是资源的“签名”,并保证它们在所有对等点中唯一标识所请求的内容。
然后,连接对等点将向其周围的对等点发送磁力链接/资源的发现请求。 DHT 的构建方式有助于确定应首先向哪些节点请求资源(有关更多信息,请阅读 Wikipedia 中的 Kademlia)。
如果请求的对等点不持有所请求的资源,它通常会将查询“传递”到从其自己的 DHT 获取的其他对等点。
查询可以传递的“跳数”通常是有限的; 4 是 JXTA 类型网络的常用数字。
当对等方持有资源时,它会回复其完整详细信息。然后,连接对等方可以连接到持有资源的对等方(直接或通过中继 - 我不会在这里详细介绍)并开始获取它。
P2P 网络中的资源/服务不直接附加到网络地址:它们是分布式的,这就是这些高度可扩展网络的优点。
Peer discovery and resource discovery (files in your case) are two different things.
I am more familiar with JXTA but all peer to peer networks work on the same basic principles.
The first thing that needs to happen is peer discovery.
Peer Discovery
Most p2p networks are "seeded" networks: when first starting a peer will connect to a well-known (hard-coded) address to retrieve a list of running peers. It can be direct seeding like connecting to
dht.transmissionbt.com
as mentioned in another post or indirect seeding as usually done with JXTA where the peer connects to an address that only delivers a plain text list of other peers network addresses.Once connection is established with the first (few) peer(s), the connecting peer performs a discovery of other peers (by sending requests out) and maintains a table of them. Since the number of other peers can be huge, the connecting peer only maintains part of a Distributed Hash Table (DHT) of the peers. The algorithm to determine which part of the table the connecting peer should maintain varies depending on Network. BitTorrent uses Kademlia with 160 bit identifiers/keys.
Resource Discovery
Once a few peers have been discovered by the connecting peer, the latter sends a few requests out for discovery of resources to them. Magnet links identifies those resources and are built in such a way that they are a "signature" for a resource and guarantee that they uniquely identify the requested content among all the peers.
The connecting peer will then send a discovery request for the magnet link/resource to peers around it. The DHT is built in such a way that it helps determine which peers should be asked first for the resource (read on Kademlia in Wikipedia for more).
If the requested peer does not hold the requested resource it will usually "pass on" the query to additional peers fetched from its own DHT.
The number of "hops" the query can be passed on is usually limited; 4 is an usual number with JXTA type networks.
When a peer holds the resource, it replies with its full details. The connecting peer can then connect to the peer holding the resource (directly or via a relay - I won't go into details here) and start fetching it.
Resources/Services in P2P networks are not directly attached to network addresses: they are distributed and that is the beauty of these highly scalable networks.
我自己也对同样的问题感到好奇。阅读传输代码,我在
libtrnasmission/tr-dht.c
中发现了以下内容:它尝试了 6 次,每次尝试之间等待 40(!) 秒。我想你可以通过删除配置文件(unix 上的
~/.config/transmission
)并阻止与dht.transmissionbt.com
的所有通信来测试它,看看会发生什么发生(至少等待 240 秒)。因此,客户端似乎首先内置了一个引导节点。当然,一旦进入网络,它就不再需要引导节点了。
I was curious by the same question myself. Reading the code for transmission, I found the following in
libtrnasmission/tr-dht.c
:It tries that 6 times, waiting 40(!) seconds between tries. I guess you can test it by deleting the config files (
~/.config/transmission
on unix), and blocking all communication todht.transmissionbt.com
, and see what happens (wait 240 seconds at least).So it appears the client has a bootstrap node built in to start with. Of course, once it has gotten into the network, it doesn't need that bootstrap node anymore.
我终于找到了规范。谷歌第一次没有提供帮助。 (wiki 链接到 bittorrent.com,这是主站点。我点击了开发者链接,注意右侧的 bittorrent.org 选项卡,然后从那里开始就很容易了。当你不知道它们的标签是什么时,很难找到链接,而且很多链接点击离开)。
似乎所有种子都有一个对等网络。您可以从跟踪器中找到同伴,并在会话之间保留它们。网络可以让你找到同伴和其他东西。我还没有读过它如何与 磁铁链接 一起使用,但似乎未定义如何新鲜客户找到同行。也许有些是内置的,或者他们使用自己的家庭服务器或嵌入客户端的已知跟踪器来获取网络中的第一个对等点。
I finally found specification. For the first time google didnt help. (wiki linked to bittorrent.com which is the main site. I Clicked the developers link, notice the bittorrent.org tab on the right then it was easy from there. Its hard finding links when you have no idea what they are labeled and many clicks away).
It seems like all torrents have a network of peers. You find peers from trackers and you keep them between sessions. The network allows you to find peers and other things. I havent read how its used with magnet links but it seems like it is undefined how a fresh client find peers. Perhaps some is baked in, or they use their home server or known trackers embeded into the client to get the first peer in the network.
当我开始回答你的问题时,我没有意识到你在问磁铁方案是如何工作的。只是想您想知道与 BitTorrent 协议相关的部分是如何生成的。
Magnet URI 中列出的哈希值是以 Base32 编码的 torrent 信息哈希值。信息哈希是 torrent 的编码信息块的 sha1 哈希。
此python 代码演示了如何计算它。
我编写了一个(非常幼稚的)C# 实现来测试这一点,因为我手头没有 Bencoder,而且它符合客户端的期望。
据我了解,这个哈希值不包含任何有关如何定位跟踪器的信息,客户端需要通过其他方式找到它(提供的公告网址)。这正是追踪器上一个 torrent 与另一个 torrent 的区别。
与 BitTorrent 协议相关的一切仍然围绕跟踪器展开。它仍然是群体之间沟通的主要方式。 Magnet URI 方案并不是专门为 BitTorrent 使用而设计的。任何 P2P 协议都使用它作为通信的替代形式。 Bittorrent 客户端已适应接受磁力链接作为识别 torrent 的另一种方式,这样您就不再需要下载 .torrent 文件了。 Magnet URI 仍然需要指定跟踪器以便找到它以便客户端可以参与。它可以包含有关其他协议的信息,但与 BitTorrent 协议无关。如果没有跟踪器,BitTorrent 协议最终将无法运行。
When I started answering your question, I didn't realize you were asking how the magnet scheme works. Just thought you wanted to know how the parts relevant to the bittorrent protocol were generated.
The hash listed in the magnet uri is the torrent's info hash encoded in base32. The info hash is the sha1 hash of the bencoded info block of the torrent.
This python code demonstrates how it can be calculated.
I wrote a (very naive) C# implementation to test this out since I didn't have a bencoder on hand and it matches what is expected from the client.
As I understand it, this hash does not include any information on how to locate the tracker, the client needs to find this out through other means (the announce url provided). This is just what distinguishes one torrent from another on the tracker.
Everything related to the bittorrent protocol still revolves around the tracker. It is still the primary means of communication among the swarm. The magnet uri scheme was not designed specifically for use by bittorrent. It's used by any P2P protocols as an alternative form of communicating. Bittorrent clients adapted to accept magnet links as another way to identify torrents that way you don't need to download .torrent files anymore. The magnet uri still needs to specify the
tr
acker in order to locate it so the client may participate. It can contain information about other protocols but is irrelevant to the bittorrent protocol. The bittorrent protocol ultimately will not work without the trackers.对等点列表可能是从升级客户端的 torrent 中填充的(例如,有一个 utorrent 的 torrent 可以对其进行升级)。只要每个人都使用相同的客户端,那就应该很好,因为您别无选择,只能共享升级。
the list of peers are probably populated from the torrent that upgrades the client (e.g. there's a torrent for utorrent that upgrades it). as long as everyone's using the same client, it should be good because you have no choice but to share the upgrade.