下载加速器如何工作?
我们要求所有下载请求都具有有效的登录(非 http),并且我们为每次下载生成交易票据。 如果您要访问其中一个下载链接并尝试“重播”交易,我们会使用 HTTP 代码转发您以获取新的交易票据。 这对于大多数用户来说效果很好。 然而,有一小部分使用下载加速器,只是尝试多次重放交易票据。
因此,为了确定我们是否想要甚至可以支持下载加速器,我们正在尝试了解它们的工作原理。
与提供静态文件的 Web 服务器的第二个、第三个甚至第四个并发连接如何加快下载过程?
加速器计划有什么作用?
We require all requests for downloads to have a valid login (non-http) and we generate transaction tickets for each download. If you were to go to one of the download links and attempt to "replay" the transaction, we use HTTP codes to forward you to get a new transaction ticket. This works fine for a majority of users. There's a small subset, however, that are using Download Accelerators that simply try to replay the transaction ticket several times.
So, in order to determine whether we want to or even can support download accelerators or not, we are trying to understand how they work.
How does having a second, third or even fourth concurrent connection to the web server delivering a static file speed the download process?
What does the accelerator program do?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
一般来说,他们不会。
为了回答您的问题的实质内容,假设服务器对每个连接的下载进行速率限制,因此同时下载多个块将使用户能够充分利用其终端的可用带宽。
They don't, generally.
To answer the substance of your question, the assumption is that the server is rate-limiting downloads on a per-connection basis, so simultaneously downloading multiple chunks will enable the user to make the most of the bandwidth available at their end.
通常,下载加速器取决于
部分内容下载 - 状态代码 206
。 就像流媒体播放器一样,媒体播放器向服务器请求完整文件的一小部分,然后下载并播放。 现在的问题是,如果服务器限制partial-content-download
,那么下载加速器将无法工作!。 配置像 Nginx 这样的服务器来限制部分内容下载很容易。如何知道文件是否可以通过范围/部分下载?
Ans:检查标头值
Accept-Ranges:
。 如果它确实存在那么你就可以开始了。如何在任何编程语言中实现这样的功能?
答:嗯,这非常简单。 只需启动一些线程/协同例程(在 I/O 或网络绑定系统中选择线程/协同例程而不是进程)即可并行下载 N 个块。 将部分文件保存在文件中的正确位置。 从技术上来说你已经完成了。 通过保留全局变量
downloaded_till_now=0
来计算下载速度,并在一个线程完成下载一个块时递增该变量。 不要忘记互斥体,因为我们正在从多个线程写入全局资源,因此请执行thread.acquire()
和thread.release(). 并且还保留一个 unix 时间计数器。 并进行数学运算,如
speed_in_bytes_per_sec = download_till_now/(current_unix_time-start_unix_time)
Typically download-accelerators depend on
partial content download - status code 206
. Just like the streaming media players, media players ask for a small chunk of the full file to the server and then download it and play. Now the catch is if a server restrictspartial-content-download
then the download accelerator won't work!. It's easy to configure a server likeNginx
to restrictpartial-content-download
.How to know if a file can be downloaded via ranges/partially?
Ans: check for a header value
Accept-Ranges:
. If it does exist then you are good to go.How to implement a feature like this in any programming language?
Ans: well, it's pretty easy. Just spin up some threads/co-routines(choose threads/co-routines over processes in I/O or network bound system) to download the N-number of chunks in parallel. Save the partial files in the right position in the file. and you are technically done. Calculate the download speed by keeping a global variable
downloaded_till_now=0
and increment it as one thread completes downloading a chunk. don't forget aboutmutex
as we are writing to a global resource from multiple thread so do athread.acquire()
andthread.release()
. And also keep a unix-time counter. and do math likespeed_in_bytes_per_sec = downloaded_till_now/(current_unix_time-start_unix_time)
我的理解是,下载加速器使用的一种方法是打开许多并行 TCP 连接 - 每个 TCP 连接只能运行得这么快,并且通常在服务器端受到限制。
TCP 的实现方式是,如果发生超时,则超时时间会增加。 这对于防止网络过载非常有效,但代价是单个 TCP 连接的速度。
下载加速器可以通过打开数十个 TCP 连接并丢弃速度慢到低于某个阈值的连接,然后打开新连接来替换速度慢的连接来解决此问题。
虽然对单个用户有效,但我认为总的来说这是一种不好的礼仪。
您看到下载加速器尝试使用相同的交易票证重新进行身份验证 - 我建议忽略这些请求。
My understanding is that one method download accelerators use is by opening many parallel TCP connections - each TCP connection can only go so fast, and is often limited on the server side.
TCP is implemented such that if a timeout occurs, the timeout period is increased. This is very effective at preventing network overloads, at the cost of speed on individual TCP connections.
Download accelerators can get around this by opening dozens of TCP connections and dropping the ones that slow to below a certain threshold, then opening new ones to replace the slow connections.
While effective for a single user, I believe it is bad etiquette in general.
You're seeing the download accelerator trying to re-authenticate using the same transaction ticket - I'd recommend ignoring these requests.
来自:http://askville.amazon.com/download-accelerator-protocol-work-advantages-benefits-application-area-scope-plz-suggest-URLs/AnswerViewer.do?requestId=9337813
报价:
加速下载最常见的方式就是开放并行下载。 许多服务器限制一个连接的带宽,因此并行打开更多连接可以提高速率。 这是通过指定下载应该开始的偏移量来实现的,HTTP 和 FTP 都支持该偏移量。
当然这种加速方式是相当“不合群”的。 实施带宽限制是为了能够为更多数量的客户端提供服务,因此使用此技术会降低能够下载的对等点的最大数量。 这就是为什么许多服务器限制并行连接数量(由 IP 识别)的原因,例如,许多 FTP 服务器都会这样做,因此如果您下载文件并尝试使用浏览器继续浏览,就会遇到问题。 从技术上讲,这是两个并行连接。
另一种提高下载速率的技术是对等网络,其中不同的源(例如受上传侧异步 DSL 的限制)用于下载。
From: http://askville.amazon.com/download-accelerator-protocol-work-advantages-benefits-application-area-scope-plz-suggest-URLs/AnswerViewer.do?requestId=9337813
Quote:
The most common way of accelerating downloads is to open up parllel downloads. Many servers limit the bandwith of one connection so opening more in parallel increases the rate. This works by specifying an offset a download should start which is supported for HTTP and FTP alike.
Of course this way of acceleration is quite "unsocial". The limitation of bandwith is implemented to be able to serve a higher number of clients so using this technique lowers the maximum number of peers that is able to download. That's the reason why many servers are limiting the number of parallel connection (recognized by IP), e.g. many FTP-servers do this so you run into problems if you download a file and try to continue browsing using your browser. Technically these are two parallel connections.
Another technique to increase the download-rate is a peer-to-peer-network where different sources e.g. limited by asynchron DSL on the upload-side are used for downloading.
大多数下载“加速器”实际上根本无法加快任何速度。 他们擅长做的是堵塞网络流量、攻击您的服务器以及破坏自定义脚本,就像您所看到的那样。 基本上它的工作原理是,它不是发出一个请求并从头到尾下载文件,而是发出四个请求......第一个从 0-25% 下载,第二个从 25-50% 下载,依此类推,它使它们同时发生。 这对任何人都有帮助的唯一特殊情况是,如果他们的 ISP 或防火墙进行某种流量整形,从而将单个下载速度限制为低于其总下载速度。
就我个人而言,如果这给您带来任何麻烦,我会说只需发出不支持下载加速器的通知,并让用户正常下载它们,或者仅使用单个线程。
Most download 'accelerators' really don't speed up anything at all. What they are good at doing is congesting network traffic, hammering your server, and breaking custom scripts like you've seen. Basically how it works is that instead of making one request and downloading the file from beginning to end, it makes say four requests...the first one downloads from 0-25%, the second from 25-50%, and so on, and it makes them all at the same time. The only particular case where this helps any, is if their ISP or firewall does some kind of traffic shaping such that an individual download speed is limited to less than their total download speed.
Personally, if it's causing you any trouble, I'd say just put a notice that download accelerators are not supported, and have the users download them normally, or only using a single thread.
您可以在 wikipedia 上获得有关下载加速器的更全面的概述。
加速是多方面的
首先
托管/加速下载的一个实质性好处是该工具会记住传输的开始/停止偏移量,并使用“部分”和“范围”标头来请求文件的部分而不是全部。
这意味着如果某些东西在事务中终止(即:TCP 超时),它只会在中断的地方重新连接,而您不必从头开始。
因此,如果连接间歇性,则总传输时间会大大缩短。
Second
Download 加速器喜欢使用相同的启动-范围-停止机制将单个传输分成几个相同大小的较小段,并并行执行它们,这大大缩短了慢速网络上的传输时间。
有一个令人讨厌的事情,称为带宽延迟乘积,其中任一端的 TCP 缓冲区的大小会与 ping 时间结合进行一些数学运算,以获得实际体验的速度,这实际上意味着大的 ping 时间将限制您的速度,无论如何所有临时连接都有许多兆比特/秒。
但是,此限制似乎是“每个连接”,因此到单个服务器的多个 TCP 连接可以帮助减轻高延迟 ping 时间对性能的影响。
因此,住在附近的人不太可能需要进行分段传输,但住在远方的人更有可能从疯狂的分段中受益。
第三,
在某些情况下,可能会找到提供相同资源的多个服务器,有时单个 DNS 地址循环到多个 IP 地址,或者服务器是某种镜像网络的一部分。 下载管理器/加速器可以检测到这一点,并在多个服务器之间应用分段传输技术,从而允许下载者获得更多的集体带宽。
支持
支持第一种加速是我个人建议的“最低”支持。 主要是因为它使用户的生活变得轻松,并且由于用户不必重复获取相同的内容而减少了必须提供的聚合数据传输量。
为了促进这一点,建议您计算他们已经转移了多少,并且不要让票证过期,直到它们看起来“完成”(同时将流量绑定到使用票证的第一个 IP),或者在给定的“合理”时间下载已经过去了。 即:在要求他们获得新票之前给他们一个宽限期。
支持第二和第三会给你加分,用户通常希望至少是第二,主要是因为国际客户不喜欢仅仅因为更长的 ping 时间而被视为二等客户,而且客观上并不会消耗更多任何意义上的带宽都很重要。 最糟糕的情况是它们可能会导致您的总吞吐量不符合您的服务运行方式。
只需限制单张票证的并发转账数量,即可相当直接地提供第一种好处,而不允许第二种好处。
You'll get a more comprehensive overview of Download Accelerators at wikipedia.
Acceleration is multi-faceted
First
A substantial benefit of managed/accelerated downloads is the tool in question remembers Start/Stop offsets transferred and uses "partial" and 'range' headers to request parts of the file instead of all of it.
This means if something dies mid transaction ( ie: TCP Time-out ) it just reconnects where it left off and you don't have to start from scratch.
Thus, if you have an intermittent connection, the aggregate transfer time is greatly lessened.
Second
Download accelerators like to break a single transfer into several smaller segments of equal size, using the same start-range-stop mechanics, and perform them in parallel, which greatly improves transfer time over slow networks.
There's this annoying thing called bandwidth-delay-product where the size of the TCP buffers at either end do some math thing in conjunction with ping time to get the actual experienced speed, and this in practice means large ping times will limit your speed regardless how many megabits/sec all the interim connections have.
However, this limitation appears to be "per connection", so multiple TCP connections to a single server can help mitigate the performance hit of the high latency ping time.
Hence, people who live near by are not so likely to need to do a segmented transfer, but people who live in far away locations are more likely to benefit from going crazy with their segmentation.
Thirdly
In some cases it is possible to find multiple servers that provide the same resource, sometimes a single DNS address round-robins to several IP addresses, or a server is part of a mirror network of some kind. And download managers/accelerators can detect this and apply the segmented transfer technique across multiple servers, allowing the downloader to get more collective bandwidth delivered to them.
Support
Supporting the first kind of acceleration is what I personally suggest as a "minimum" for support. Mostly, because it makes a users life easy, and it reduces the amount of aggregate data transfer you have to provide due to users not having to fetch the same content repeatedly.
And to facilitate this, its recommended you, compute how much they have transferred and don't expire the ticket till they look "finished" ( while binding traffic to the first IP that used the ticket ), or a given 'reasonable' time to download it has passed. ie: give them a window of grace before requiring they get a new ticket.
Supporting the second and third give you bonus points, and users generally desire it at least the second, mostly because international customers don't like being treated as second class customers simply because of the greater ping time, and it doesn't objectively consume more bandwidth in any sense that matters. The worst that happens is they might cause your total throughput to be undesirable for how your service operates.
It's reasonably straight forward to deliver the first kind of benefit without allowing the second simply by restricting the number of concurrent transfers from a single ticket.
我认为这个想法是许多服务器限制或平均分配连接之间的带宽。 通过拥有多个连接,您就欺骗了该系统并获得了超出您“公平”份额的带宽。
I believe the idea is that many servers limit or evenly distribute bandwidth across connections. By having multiple connections, you're cheating that system and getting more than your "fair" share of bandwidth.
这都是关于利特尔定律。 具体来说,发送到 Web 服务器的每个流都会遇到一定量的 TCP 延迟,因此只会携带一定量的数据。 增加 TCP 窗口大小和实施选择性确认等技巧虽然有所帮助,但实施得很差,而且通常会导致比解决的问题更多的问题。
拥有多个流意味着随着全局吞吐量的总体增加,每个流的延迟变得不那么重要。
即使使用单线程,下载加速器的另一个关键优势是它通常比使用内置下载工具的网络浏览器更好。 例如,如果网络浏览器决定终止,下载工具将继续。 下载工具可能支持暂停/恢复等内置浏览器不支持的功能。
It's all about Little's Law. Specifically each stream to the web server is seeing a certain amount of TCP latency and so will only carry so much data. Tricks like increasing the TCP window size and implementing selective acks help but are poorly implemented and generally cause more problems than they solve.
Having multiple streams means that the latency seen by each stream is less important as the global throughput increases overall.
Another key advantage with a download accelerator even when using a single thread is that it's generally better than using the web browsers built in download tool. For example if the web browser decides to die the download tool will continue. And the download tool may support functionality like pausing/resuming that the built-in brower doesn't.