BitTorrent 对等点可以处理大量空闲 torrent 的播种吗

发布于 2024-11-25 15:49:01 字数 460 浏览 1 评论 0原文

我正在考虑使用 BitTorrent 来解决大型数据传播问题,其中数据源为千万亿级,用户需要高达数 TB 的数据。一些细节

  • 种子数量可能达到数百万
  • 种子大小从 100Mb 到 100Gb 遍布
  • 世界各地的一组稳定的集群,能够充当播种者,每个集群都持有总种子的很大一部分(平均为 60%)
  • 数量相对较少想要平均下载几 TB 数据的并发用户(少于 100 个)。

我预计活跃的 torrent 数量与可用总数相比会很小,但服务质量很重要,因此每个 torrent 必须有多个播种器或启动新播种器的某种机制。

我的问题是 BitTorrent 客户端能否处理大量种子(其中大部分是闲置的)?我是否需要在集群中的播种器之间对种子进行条带化,或者每个节点是否可以播种它有权访问的所有种子?哪个客户做得最好?有没有用于管理播种机集群的工具?

我假设跟踪器可以扩展到这个级别。

I'm considering using bittorrent for a large data dissemination problem where the data source is petascale and users will want up to several terabytes. Some details

  • Number of torrents potentially in the millions
  • torrent sizes ranging from 100Mb to 100Gb
  • A stable set of clusters around the world capable of acting as seeders each holding a large subset of the total torrents (say 60% on average)
  • A relatively small number of simultaneous users (less than 100) wanting to download on average a few terabytes of data.

I expect the number of active torrents to be small compared to the total available but quality of service is important so there must be several seeders for each torrent or some mechanism for launching new seeders.

My question is can bittorrent clients handle seeding huge numbers of torrents, most of which are idle? Would I need to stripe torrents across the seeders in a cluster or could each node be seeding all torrents it has access to? Which client would do the best job? Are there any tools for managing clusters of seeders?

I am assuming that trackers can be made to scale to this level.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

白馒头 2024-12-02 15:49:01

有两个主要问题:

  1. 每个 torrent(通常)需要定期向跟踪器通告,这可能最终会使用大量带宽。
  2. BitTorrent 客户端本身需要以能够扩展大量 torrent 的方式编写

。至于跟踪器流量,假设您有 100 万个 torrent,典型的重新公布间隔为 30 分钟,但有些跟踪器将其设置为1小时。让我们保守一点,假设您的跟踪器使用 1 小时的通知间隔。您必须每小时发出 100 万个 GET 请求,假设每个请求向上 400 字节,向下 100 字节(假设大多数响应不包含任何对等点),这大约是持续 111 kB/s 向上和 28 kB/s 向下。这还不错,但请记住,TCP 需要额外的往返来建立连接,因此又需要向下 40 个字节和向上 40 个字节。

仅使用 UDP 跟踪器 即可缓解此问题。那么您只需要一条连接消息,并且可以为每个公告重复使用连接 ID。每个公告消息将是 100 字节,并且返回的消息也会更紧凑一些,假设为 60 字节。这将使您的上传速度为 28 kB/s,下降为 16kB/s,只是为了保持种子的发布。为此,您需要一个具有良好 udp 跟踪器支持的客户端(例如缓存连接 ID 的客户端)。

还不错,假设这与您的种子发送的实际数据相比微不足道。

但是,您不一定需要将 torrent 分散到不同的数据中心,您也可以使用 HTTP 服务器来播种 torrent。所有主要的 BitTorrent 客户端都支持 http 播种,您不必担心向跟踪器通告(URL 已刻录到 .torrent 本身中)。

至于可以很好地扩展种子的客户端,我不确定,我还没有进行任何测量。生成一百万个随机种子并尝试加载它应该相当简单。

我在 libtorrent rasterbar 中做了一些优化工作,使其能够很好地适应许多 torrent,但我还没有尝试过数百万次。

我在此处写了一篇关于此主题的博客文章。

There are 2 main problems:

  1. Each torrent (typically) needs to announce to a tracker periodically, this might end up using a significant amount of bandwidth.
  2. The bittorrent client itself need to be written in a way to scale with a large number of torrents

As for the tracker traffic, let's assume you have 1 million torrents, the typical re-announce interval is 30 minutes, but some tracker has it set to 1 hour. Let's be conservative and assume your tracker uses 1 hour announce intervals. You will have to make 1 million GET requests per hour, let's say each request is 400 bytes up and 100 bytes down (assuming most responses will not contain any peers), that's about 111 kB/s up and 28 kB/s down constantly. That's not so bad, but keep in mind that TCP requires an extra round-trip for establishing connections, so that's another 40 bytes down and 40 bytes up.

This can be mitigated by only using UDP trackers. Then you would only need a single connect-message, and you can reuse the connection ID for each announce. Each announce message would then be 100 bytes, and the returned message would be a bit more compact as well, let's assume 60 bytes. That would get you 28 kB/s up and 16kB/s down, just to keep the torrents announced. For this you would need a client with decent udp tracker support (one that caches the connection ID for instance).

Not too bad, assuming that's insignificant compared to the actual data your seeds would send.

However, you don't necessarily need to stripe your torrents across separate data centers, you could also use an HTTP server to seed the torrents. All major bittorrent clients support http seeding, and you wouldn't have to worry about announcing to the tracker (the URL is burned into the .torrent itself).

As for a client that scales well with torrents, I don't know for sure, I haven't done any measurements. It should be fairly straightforward to just generate a million random torrents and try to load it up.

I have done some optimization work in libtorrent rasterbar to make it scale well with many torrents, I haven't tried millions though.

I've written a blog post on this topic, here.

—━☆沉默づ 2024-12-02 15:49:01

您可能正在寻找 Hekate
它现在充其量只是预阿尔法,但它与您所描述的非常接近。

You may be looking for Hekate
It's in, at best, pre-alpha right now, but it's quite nearly what you're describing.

故人爱我别走 2024-12-02 15:49:01

为了避免在数以百万计的无用跟踪器公告和刮擦(以及在每个公告间隔中)的开销下崩溃,您必须限制种子集群仅加载当前请求的当前工作项目集。无论如何,下载者都需要从中心位置获取(下载).torrent 文件,这可能会触发将其加载到种子集群中。或者,通过识别并非源自种子集群之一的公告来确定特定信息哈希的活动。

rTorrent 具有快速恢复功能(这意味着加载适当准备的 .torrent 时不会发生哈希),并且可以通过 xmlrpc 进行控制,因此您可以停用闲置项目。这样,.torrent 下载就可以触发实际数据在接下来的 24 小时内可用,或者只要集群中有活动就可用。

To not collapse under the overhead of useless tracker announces and scrapes in the millions (and that in every announce interval), you have to restrict your seeding clusters to only load the current working set of items that are requested right now. Downloaders need to get (download) the .torrent file from a central place anyway, and that could trigger loading it into the seeding clusters. Alternatively, determine activity for a particcular info-hash by recognizing announces that do NOT originate from one of your seed clusters.

rTorrent has fast-resume (meaning no hashing happens when an appropriately prepared .torrent is loaded), and is controllable via xmlrpc so you can decommission idle items. That way, a .torrent download can trigger the actual data to be available for the next 24 hours, or as long as there's activity in the swarm.

沙沙粒小 2024-12-02 15:49:01

该协议允许这样做,但我不知道哪些客户端可以扩展到数百万个种子。在最坏的情况下,您将不得不编写自己的仅种子客户端。

与您的用例最相关的协议功能是,当一个对等点连接到另一个对等点时,连接的对等点应该首先发送 torrent 的信息哈希。这意味着单个侦听 TCP 端口可用于播种无限量的种子,空闲时使用的资源几乎为零。

这可以在BitTorrent 协议规范中找到:

如果双方不发送相同的值,则会断开连接。一个可能的例外是,如果下载者想要通过单个端口进行多次下载,他们可能会等待传入连接首先给出下载哈希,然后使用相同的哈希进行响应(如果它在他们的列表中)。

我还在 BitTorrent 协议规范 v1.0 上发现了相同的内容:

连接的发起者应该立即传输握手信号。如果接收者能够同时提供多个种子(种子由其 info_hash 唯一标识),则接收者可以等待发起者的握手。

然而,有一样东西会增加你的负载,那就是追踪器。使用正常的跟踪器协议,每个客户端都必须定期向跟踪器宣布其拥有的每个 torrent,以及已上传的数量等信息。对于数百万个种子,这会带来较高的负载。如果您正在编写自己的仅批量种子客户端,那么使用单独的协议向跟踪器宣布您的种子程序将是一个好主意。

The protocol allows for this, but I do not know which clients would scale to millions of torrents. In the worst case, you would have to write your own seed-only client.

The protocol feature most relevant to your use case is that, when a peer connects to another, the connecting peer is supposed to send the torrent's info-hash first. This means that a single listening TCP port could be used to seed an unlimited amount of torrents, with almost zero resources used when idle.

This can be found on The BitTorrent Protocol Specification:

If both sides don't send the same value, they sever the connection. The one possible exception is if a downloader wants to do multiple downloads over a single port, they may wait for incoming connections to give a download hash first, and respond with the same one if it's in their list.

I also found the same on this Bittorrent Protocol Specification v1.0:

The initiator of a connection is expected to transmit their handshake immediately. The recipient may wait for the initiator's handshake, if it is capable of serving multiple torrents simultaneously (torrents are uniquely identified by their info_hash).

However, there is one thing that would increase your load, and it is the tracker. With the normal tracker protocol, each client has to periodically announce to the tracker each torrent it has, together with information like how much it has uploaded. With millions of torrents, this would present a somewhat high load. If you were writing your own mass-seed-only client, a separate protocol to announce your seeders to the tracker would be a good idea.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文