如何(准确)估计剩余下载时间?
当然,您可以将剩余文件大小除以当前下载速度,但如果您的下载速度波动(而且它会波动),这不会产生非常好的结果。有什么更好的算法可以产生更平滑的倒计时?
Sure you could divide the remaining file size by the current download speed, but if your download speed fluctuates (and it will), this doesn't produce a very nice result. What's a better algorithm for producing smoother countdowns?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
我认为您能做的最好的事情就是将剩余文件大小除以平均下载速度(到目前为止下载的速度除以您下载的时间)。开始时会略有波动,但下载时间越长,就会变得越来越稳定。
I think the best you can do is divide the remaining file size by the average download speed (downloaded so far divided with how long you've been downloading). This will fluctuate a little to start but will be more and more stable the longer you download.
我发现 Ben Dolman 的答案非常有帮助,但对于像我这样不太喜欢数学的人来说,我仍然花了大约一个小时才能将其完全实现到我的代码中。这是在 python 中表达相同内容的更简单的方法,如果有任何不准确之处请告诉我,但在我的测试中它效果很好:
I found Ben Dolman's answer very helpful, but for someone like myself who is not so math inclined it still took me about an hour to fully implement this into my code. Here's a more simple way of saying the same thing in python, if there are any inaccuracies let me know but in my testing it works very well:
作为 Ben Dolman 答案的扩展,您还可以计算算法内的波动。它会更平滑,但也会预测平均速度。
像这样的事情:
无论波动与否,它都将与另一个一样稳定,具有正确的预测和依赖速度值;你必须根据你的互联网速度稍微玩一下它。
此设置非常适合 600 kB/s 的平均速度,同时它在 0 到 1MB 之间波动。
In extension to Ben Dolman's answer, you could also calculate the fluctuating within the algorithm. It will be more smooth, but it will also predict the avarage speed.
Something like this:
Fluctuation or not, it will be both as stable as the other, with the right values for prediction and depencySpeed; you have to play with it a little depending on your internet speed.
This settings are perfect for a avarage speed of 600 kB/s while it fluctuates from 0 to 1MB.
指数移动平均线对此非常有用。它提供了一种平滑平均值的方法,以便每次添加新样本时,旧样本对整体平均值的重要性就会降低。它们仍然被考虑,但它们的重要性呈指数下降——因此得名。由于它是“移动”平均值,因此您只需保留一个数字即可。
在测量下载速度的上下文中,公式如下所示:
SMOOTHING_FACTOR
是 0 到 1 之间的数字。该数字越高,丢弃旧样本的速度越快。正如您在公式中看到的,当SMOOTHING_FACTOR
为 1 时,您只是使用上次观察的值。当SMOOTHING_FACTOR
为 0 时,averageSpeed
永远不会改变。因此,您需要介于两者之间的值,并且通常是较低的值以获得良好的平滑效果。我发现 0.005 为平均下载速度提供了相当好的平滑值。lastSpeed
是最后测量的下载速度。您可以通过每秒运行一个计时器来获取此值,以计算自上次运行以来已下载的字节数。显然,
averageSpeed
是您想要用来计算预计剩余时间的数字。将其初始化为您获得的第一个lastSpeed
测量值。An exponential moving average is great for this. It provides a way to smooth your average so that each time you add a new sample the older samples become decreasingly important to the overall average. They are still considered, but their importance drops off exponentially--hence the name. And since it's a "moving" average, you only have to keep a single number around.
In the context of measuring download speed the formula would look like this:
SMOOTHING_FACTOR
is a number between 0 and 1. The higher this number, the faster older samples are discarded. As you can see in the formula, whenSMOOTHING_FACTOR
is 1 you are simply using the value of your last observation. WhenSMOOTHING_FACTOR
is 0averageSpeed
never changes. So, you want something in between, and usually a low value to get decent smoothing. I've found that 0.005 provides a pretty good smoothing value for an average download speed.lastSpeed
is the last measured download speed. You can get this value by running a timer every second or so to calculate how many bytes have downloaded since the last time you ran it.averageSpeed
is, obviously, the number that you want to use to calculate your estimated time remaining. Initialize this to the firstlastSpeed
measurement you get.几年前,我编写了一个算法来预测磁盘映像和多播程序中的剩余时间,该程序使用移动平均值,并在当前吞吐量超出预定义范围时进行重置。它会让事情保持平稳,除非发生剧烈的事情,然后它会迅速调整,然后再次回到移动平均线。请参阅此处的示例图表:
该示例图表中的蓝色粗线是一段时间内的实际吞吐量。请注意,传输前半段的吞吐量较低,然后在后半段吞吐量急剧上升。橙色线是总体平均值。请注意,它永远不会调整得足够远,无法准确预测完成需要多长时间。灰线是移动平均值(即最后 N 个数据点的平均值 - 在此图中 N 为 5,但实际上,N 可能需要更大才能足够平滑)。恢复得比较快,但还是需要一段时间来调整。 N越大,花费的时间越长。因此,如果您的数据非常嘈杂,那么 N 必须更大,恢复时间也会更长。
绿线是我使用的算法。它的走势就像移动平均线一样,但是当数据超出预定义范围(由浅蓝色和黄色细线指定)时,它会重置移动平均线并立即向上跳跃。预定义范围还可以基于标准偏差,因此它可以自动调整数据的噪声程度。我只是将这些值放入 Excel 中以图表形式表示此答案,因此它并不完美,但您明白了。
不过,数据可能会导致该算法无法很好地预测剩余时间。最重要的是,您需要对数据的预期行为有一个总体了解,并相应地选择算法。我的算法对于我看到的数据集效果很好,所以我们继续使用它。
另一个重要提示是,开发人员通常会忽略进度条和时间估计计算中的设置和拆卸时间。这会导致永恒的 99% 或 100% 进度条长时间停留在那里(当缓存被刷新或其他清理工作正在进行时)或在扫描目录或其他设置工作发生时进行疯狂的早期估计,从而增加时间但没有取得任何百分比的进展,这让一切都失败了。您可以运行多个测试,其中包括设置和拆卸时间,并估计这些时间的平均时间或基于作业的大小,并将该时间添加到进度栏。例如,前 5% 的工作是安装工作,最后 10% 是拆卸工作,然后中间的 85% 是下载或您跟踪的任何重复过程。这也有很大帮助。
I wrote an algorithm years ago to predict time remaining in a disk imaging and multicasting program that used a moving average with a reset when the current throughput went outside of a predefined range. It would keep things smooth unless something drastic happened, then it would adjust quickly and then return to a moving average again. See example chart here:
The thick blue line in that example chart is the actual throughput over time. Notice the low throughput during the first half of the transfer and then it jumps up dramatically in the second half. The orange line is an overall average. Notice that it never adjusts up far enough to ever give an accurate prediction of how long it will take to finish. The gray line is a moving average (i.e. the average of the last N data points - in this graph N is 5, but in reality, N might need to be larger to smooth enough). It recovers more quickly, but still takes a while to adjust. It will take more time the larger N is. So if your data is pretty noisy, then N will have to be larger and the recovery time will be longer.
The green line is the algorithm I used. It goes along just like a moving average, but when the data moves outside a predefined range (designated by the light thin blue and yellow lines), it resets the moving average and jumps up immediately. The predefined range can also be based on standard deviation so it can adjust to how noisy the data is automatically. I just threw these values into Excel to diagram them for this answer so it's not perfect, but you get the idea.
Data could be contrived to make this algorithm fail to be a good predictor of time remaining though. The bottom line is that you need to have a general idea of how you expect the data to behave and pick an algorithm accordingly. My algorithm worked well for the data sets I was seeing, so we kept using it.
One other important tip is that usually developers ignore setup and teardown times in their progress bars and time estimate calculations. This results in the eternal 99% or 100% progress bar that just sits there for a long time (while caches are being flushed or other cleanup work is happening) or wild early estimates when the scanning of directories or other setup work happens, accruing time but not accruing any percentage progress, which throws everything off. You can run several tests that include the setup and teardown times and come up with an estimate of how long those times are on average or based on the size of the job and add that time to the progress bar. For example, the first 5% of work is setup work and the last 10% is teardown work and then the 85% in the middle is the download or whatever repeating process your tracking is. This can help a lot too.