估计/预测下载完成时间
我们都嘲笑“还剩 X 分钟”对话框,这似乎太简单了,但我们如何改进它呢?
实际上,输入是截至当前时间的下载速度集,我们需要使用它来估计完成时间,也许带有确定性指示,例如使用某些 Y% 置信区间的“剩余 20-25 分钟”。
执行此操作的代码可以放入一个小库中并在所有项目中使用,那么这真的那么困难吗?你会怎么做?您对之前的下载速度有何权重?
或者已经有一些开源代码了吗?
编辑:总结:
- 通过更好的算法/过滤器等改进估计完成时间。
- 提供间隔而不是单一时间(“1h45-2h30 分钟”),或者只是限制精度(“大约 2 小时”)。
- 指出进展何时停滞——尽管如果进展持续停滞然后继续,我们应该能够处理这个问题。也许“大约 2 小时,目前停滞不前”
We've all poked fun at the 'X minutes remaining' dialog which seems to be too simplistic, but how can we improve it?
Effectively, the input is the set of download speeds up to the current time, and we need to use this to estimate the completion time, perhaps with an indication of certainty, like '20-25 mins remaining' using some Y% confidence interval.
Code that did this could be put in a little library and used in projects all over, so is it really that difficult? How would you do it? What weighting would you give to previous download speeds?
Or is there some open source code already out there?
Edit: Summarising:
- Improve estimated completion time via better algo/filter etc.
- Provide interval instead of single time ('1h45-2h30 mins'), or just limit the precision ('about 2 hours').
- Indicate when progress has stalled - although if progress consistently stalls and then continues, we should be able to deal with that. Perhaps 'about 2 hours, currently stalled'
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
更一般地说,我认为您正在寻找一种方法来即时测量传输速度,该速度通常是通过一小段时间的平均值获得的。
问题通常是,为了反应性,周期通常非常小,这会导致溜溜球效应。
我会提出一个非常简单的方案,让我们对其进行建模。
考虑随时间 (x) 变化的曲线速度 (y)。
即时速度,只不过是读取当前 x (x0) 的 y。
平均速度不超过
Integral(f(x), x in [x0-T,x0]) / T
我提出的方案是应用一个过滤器,给最后时刻更多的权重,同时仍然采取考虑到过去的时刻。
它可以轻松地实现为
g(x,x0,T) = 2 * (x - x0) + 2T
,这是表面 T 的一个简单三角形。现在您可以计算
Integral( f(x)*g(x,x0,T), x in [x0-T,x0]) / T
,这应该有效,因为两个函数始终为正。当然,您可以有不同的
g
,只要它在给定区间内始终为正,并且它在区间上的积分为 T(因此它自己的平均值恰好为 1)。这种方法的优点是,因为您对即时事件给予更多的权重,所以即使考虑更大的时间间隔,您也可以保持相当的反应性(这样平均值就更精确,并且不太容易出现问题)。
另外,我很少看到但认为会提供更精确的估计的是将用于计算平均值的时间与估计的剩余时间相关联:
因此,下载时间越长,我需要的反应就越少,我可以平均得到的就越多。一般来说,我想说一个窗口可以覆盖总时间的 2%(也许除了少数的初步估计,因为人们喜欢即时反馈)。此外,一次以整个百分比表示进度就足够了。如果任务很长,我还是准备等待。
More generally, I think you are looking for a way to give an instant mesure of the transfer speed, which is generally obtained by an average over a small period.
The problem is generally that in order to be reactive, the period is usually extremely small, which leads to the yoyo effect.
I would propose a very simple scheme, let's model it.
Think of a curve speed (y) over time (x).
the Instant Speed, is no more than reading y for the current x (x0).
the Average Speed, is no more than
Integral(f(x), x in [x0-T,x0]) / T
the scheme I propose is to apply a filter, to give more weight to the last moments, while still taking into account the past moments.
It can be easily implement as
g(x,x0,T) = 2 * (x - x0) + 2T
which is a simple triangle of surface T.And now you can compute
Integral(f(x)*g(x,x0,T), x in [x0-T,x0]) / T
, which should work because both functions are always positive.Of course you could have a different
g
as long as it's always positive in the given interval and that its integral on the interval is T (so that its own average is exactly 1).The advantage of this method is that because you give more weight to immediate events, you can remain pretty reactive even if you consider larger time intervals (so that the average is more precise, and less susceptible to hiccups).
Also, what I have rarely seen but think would provide more precise estimates would be to correlate the time used for computing the average to the estimated remaining time:
So, the longer the download is going to take, the less reactive I need to be, and the more I can average out. In general, I would say that a window could cover 2% of the total time (perhaps except for the few first estimates, because people appreciate immediate feedback). Also, indicating progress by whole % at a time is sufficient. If the task is long, I was prepared to wait anyway.
我想知道,状态估计技术在这里会产生好的结果吗?像卡尔曼滤波器之类的东西?
基本上,您通过查看当前模型来预测未来,并在每个时间步骤更改模型以反映现实世界的变化。我认为这种技术用于估计笔记本电脑电池的剩余时间,该时间也可能根据使用情况、电池寿命等而变化。
请参阅 http://en.wikipedia.org/wiki/Kalman_filter 了解更深入的描述算法。
该过滤器还提供了方差度量,可用于指示您对估计的置信度(尽管,正如其他答案所提到的,向最终用户显示这一点可能不是最好的主意)
有谁知道这是否是实际上在某个地方用于下载(或文件复制)估计?
I wonder, would a state estimation technique produce good results here? Something like a Kalman Filter?
Basically you predict the future by looking at your current model, and change the model at each time step to reflect the changes to the real world. I think this kind of technique is used for estimating the time left on your laptop battery, which can also vary according to use, age of battery, etc'.
see http://en.wikipedia.org/wiki/Kalman_filter for a more in depth description of the algorithm.
The filter also gives a variance measure, which could be used to indicate your confidence of the estimate (allthough, as was mentioned by other answers, it might not be the best idea to show this to the end user)
Does anyone know if this is actually used somewhere for download (or file copy) estimation?
不要提供超出用户所需的信息,从而让用户感到困惑。我正在考虑置信区间。跳过它。
互联网下载时间变化很大。微波炉会干扰 WiFi。使用情况因一天中的不同时间、一周中的某一天、节假日以及新的令人兴奋的游戏的发布而异。服务器现在可能负载很重。如果您将笔记本电脑带到咖啡馆,结果将与在家中不同。因此,您可能不能依赖历史数据来预测下载速度的未来。
如果您无法准确估计剩余时间,那么不要通过提供此类估计来欺骗您的用户。
如果您知道必须下载多少数据,您可以提供完成进度百分比。
如果您根本不知道,请提供“心跳” - 一段移动的 UI,向用户显示一切正常,即使您不知道还需要多长时间。
Don't confuse your users by providing more information than they need. I'm thinking of the confidence interval. Skip it.
Internet download times are highly variable. The microwave interferes with WiFi. Usage varies by time of day, day of week, holidays, and releases of new exciting games. The server may be heavily loaded right now. If you carry your laptop to cafe, the results will be different than at home. So, you probably can't rely on historical data to predict the future of download speeds.
If you can't accurately estimate the time remaining, then don't lie to your user by offering such an estimate.
If you know how much data must be downloaded, you can provide % completed progress.
If you don't know at all, provide a "heartbeat" - a piece of moving UI that shows the user that things are working, even through you don't know how long remains.
改进估计时间本身:直观上,我猜测网络连接的速度是围绕某个临时平均速度的一系列随机值 - 事物以一个速度运行,然后突然变慢或加速。
那么,一种选择可能是按某个指数对前一组速度进行加权,以便最新的值获得最强的加权。这样,随着先前的平均速度进一步进入过去,它对当前平均值的影响就会减少。
但是,如果速度随机波动,则可能值得展平指数顶部(例如,通过使用 高斯过滤),以避免太大的波动。
所以总而言之,我正在考虑测量标准偏差(可能仅限于最后 N 分钟)并使用它来生成应用于输入的高斯滤波器,然后使用标准偏差限制引用的精度。
但是,如何将标准差计算限制为最后 N 分钟呢?你怎么知道可以使用多长时间?
或者,可以通过模式识别来检测我们是否达到稳定的速度。
Improving the estimated time itself: Intuitively, I would guess that the speed of the net connection is a series of random values around some temporary mean speed - things tick along at one speed, then suddenly slow or speed up.
One option, then, could be to weight the previous set of speeds by some exponential, so that the most recent values get the strongest weighting. That way, as the previous mean speed moves further into the past, its effect on the current mean reduces.
However, if the speed randomly fluctuates, it might be worth flattening the top of the exponential (e.g. by using a Gaussian filter), to avoid too much fluctuation.
So in sum, I'm thinking of measuring the standard deviation (perhaps limited to the last N minutes) and using that to generate a Gaussian filter which is applied to the inputs, and then limiting the quoted precision using the standard deviation.
How, though, would you limit the standard deviation calculation to the last N minutes? How do you know how long to use?
Alternatively, there are pattern recognition possibilities to detect if we've hit a stable speed.
我自己也断断续续地考虑过这个问题。答案首先是在计算当前(以及未来)传输速率时保持保守,并包括较长时期内的平均值,以获得更稳定的估计。也许对显示的时间进行低通滤波,这样就不会在 2 分钟和 2 天之间跳跃。
我认为置信区间不会有帮助。大多数人无法解释它,它只会显示更多猜测的内容。
I've considered this off and on, myself. I the answer starts with being conservative when computing the current (and thus, future) transfer rate, and includes averaging over longer periods, to get more stable estimates. Perhaps low-pass filtering the time that is displayed, so that one doesn't get jumps between 2 minutes and 2 days.
I don't think a confidence interval is going to be helpful. Most people wouldn't be able to interpret it, and it would just be displaying more stuff that is a guess.