计算操作预计到达时间的最佳方法?
我正在寻找使用线性进度信息计算操作(即:文件下载)的 ETA 的最佳方法。
假设我有以下被调用的方法:
void ReportProgress(double position, double total)
{
...
}
我有几个想法:
- 计算设定时间(如最后 10 秒)内的进度,并使用该速度作为操作的平均速度,
- 保留一组最后的结果x 已报告的进度,计算每个增量的速度并使用平均值
I am looking for the best way to calculate ETA of an operation (IE: file download) using a linear progress information.
Lets say that I have the following method that gets called:
void ReportProgress(double position, double total)
{
...
}
I have a couple of ideas:
- calculate the progress in a set amount of time (like last 10s) and use that speed as an average speed for the operation
- keep a set of the last x progresses that has been reported, calculate the speed of each increment and use the average
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
事实上,我鄙视这两种想法,因为它们都曾经让我作为开发人员感到困扰。
第一个没有考虑操作实际上变得更快的情况,它说还有10分钟,我3点后回来就完成了。
第二个没有考虑到操作变慢的情况 - 我认为 Windows 资源管理器必须使用此方法,因为它似乎总是花费 90% 的时间复制 90% 的文件,然后用另外 90% 的时间复制最后 10 个文件文件的百分比:-)。
我很早就开始计算这两个数字并对其进行平均。 客户并不关心(他们也并不真正关心其他两个选项,他们只是想看到一些进展),但这让我感觉更好,这就是我真正关心的全部一天结束了;-)
I actually despise both those ideas because they've both bitten me before as a developer.
The first doesn't take into account the situation where the operation actually gets faster, it says that there's 10 minutes to go and I come back after 3 and it's finished.
The second doesn't take into account the operation getting slower - I think Windows Explorer must use this method since it always seems to take 90% of the time copying 90% of the files, then another 90% of the time copying that last 10% of the files :-).
I've long since taken to calculating both those figures and averaging them. The clients don't care (they didn't really care about the other two option either, they just want to see some progress) but it makes me feel better, and that's really all I care about at the end of the day ;-)
像这样的事情应该可以解决问题:
当进度接近 100% 时,估计值会更接近事实
Something like this should do the trick:
The estimate gets closer to the truth as the progress approaches 100%
我认为这个问题几乎无法解决,但是通过对正在执行的流程有更多的了解,可以创建一些准确的估计。 在存在大量未知因素的情况下,最好将这些未知因素告知用户,以便他们能够将其考虑在内。
以下载一批文件为例,您有两个已知变量:
对于每个文件,都有一个恒定的开销(建立连接所需的时间以及建立连接所需的时间)打开文件系统上的文件)。 下载时间也与文件大小相关。 创建一个可以将其表示为当前下载速度的剩余时间的函数很容易,而且只要下载速度波动不大,就很准确。 但问题就在这里。
有了您正在执行的操作的准确模型,只要没有外部影响,就可以轻松预测需要多长时间。 而这几乎是不可能的。
但是,您可以寻求一种尝试理解和解释这些外部影响的解决方案。 用户可能会发现,当速度发生急剧变化时收到警报很有帮助,因为他们可以调整计划以适应新的预计到达时间。 解释哪些因素正在影响当前的运营也可能会有所帮助。 例如,
如果用户知道速度可能会改变,则这允许用户做出一些有根据的猜测。 最终减少挫败感。
I think that this problem is pretty much unsolvable, but it is possible to create some accurate estimations with a bit more knowledge of the process that is executing. And in the cases where there are large unknowns it is better to inform the user of those unknowns so that they can take them into account.
To take the simple example of downloading a batch of files you have two known variables:
For each file there is a constant overhead (the time it takes to establish a connection, and the time it takes to open a file on the file system). There is also the obvious download time associated with the size of the files. Creating a function that can express this as time remaining in terms of the current download speed is easy, and accurate provided the downlaod speed doesnt fluctuate too much. But there lies the problem.
With an accurate model of the operation you are performing it is easy to predict how long it will take provided there are no outside influences. And that is rarely possible.
However you could go for a solution that attempts to understand and explain these outside influences. The user may find it helpful to be alerted when the speed changes dramatically as they can adjust their plans to fit with the new ETA. It may also be helpful to explain what factors are affecting the current operation. eg
This allows the user to make some educated guesses if they know that speeds are likely to change. And ultimately leads to less frustrations.
布拉姆·科恩(Bram Cohen)对此进行了一些讨论。 他在 BitTorrent 中的 ETA 计算上投入了大量精力(但在一次演讲中,他提到还没有人对他说“嘿!BitTorrent 中的 ETA 计算很棒!”)。 这不是一个简单的问题。
一些相关链接:
Bram Cohen has talked about this a bit. He has put a lot of effort into the ETA calculations in BitTorrent (yet in a talk he mentioned that no one has yet come up to him and say "hey! great ETA calculations in bittorrent man!"). It's not a simple problem.
Some relevant links:
如果您想要预计到达时间而不是“进度条”,那么您可以提供多个数字吗?
计算一段时间内的平均下载速度(取决于整个下载可能持续的时间,如果您要查看 10 分钟以上,那么每 5 秒左右就可以了)并记录平均值。
然后您可以提供两个数字,即估算上限和估算下限。
如果您确信平均值能够很好地指示总下载时间,那么您可以显示第 40 个百分位数和第 60 个百分位数 - 如果平均下载时间差异很大,那么第 10 个百分位数和第 90 个百分位数可能会更好。
我宁愿看到一个大概的“21-30 分钟”,并且它是准确的,也不愿被告知 29 分 35.2 秒,而且它相差数英里,并且每次更新之间的差异很大。
If you're wanting an ETA rather than a 'progress bar' then can you supply more than one figure?
Calculate the average download speed over a set period of time (depending on how long the overall download is likely to last, if you're looking at 10+ minutes then every 5s or so would be ok) and keep a record of the averages.
Then you can provide two figures, an upper and lower estimate.
If you're confident that the averages are going to be a good indication of the total time to download, then you could display the 40th percentile and the 60th - if the average download times vary widely then the 10th and 90th might be better.
I'd rather see a ballpark '21-30 minutes' and it be accurate than be told 29 min 35.2 seconds and it be miles out, and varying wildly from one update to the next.
这将取决于操作时间的一致性。 如果一致的话,使用之前操作的平均时间就完全合理了。 如果不是,您最好对当前操作进行计时并进行推断。
编辑:如果操作与以前的运行不一致,并且从开始到结束也不一致,那么您将遇到无法解决的问题。 预测不可预测的事情总是很有趣:)
您可能会提前决定是否要低估或高估,并在估计中添加一个模糊因素。 例如,如果您想要高估,并且前 10% 需要 6 秒,您可以外推到 60 秒,然后乘以 1.5 以获得 90 秒的总估计值。 随着完成百分比的增加,减少模糊因子,直到 100% 时变为 1.0。
It will depend on how consistent the operation timing is. If it's consistent, it would be perfectly reasonable to use the average time of previous operations. If it's not, you're better off timing the current operation and extrapolating.
Edit: If the operation is inconsistent from previous runs, and also inconsistent from start to finish, then you have an unsolveable problem. Predicting the unpredictable is always fun :)
You might decide ahead of time if you want to underestimate or overestimate, and add a fudge factor to the estimate. For example, if you want to overestimate, and the first 10% takes 6 seconds, you might extrapolate to 60 seconds then multiply by 1.5 to get a total estimate of 90 seconds. As the percentage complete grows, decrease the fudge factor until at 100% it becomes 1.0.
我从事的项目需要进行长时间、耗时的计算,最终我所做的就是将流程分成相同大小的批次。 然后,我计算计算每个批次所需的时间,并将所花费的时间添加到过去计算时间的 FIFO 列表中。
然后对列表中的时间进行平均,并将所得时间乘以剩余批次的数量。
注意列表具有固定长度,该长度应该足够长,以便让估计过程“记住”并调整到计算中可能的峰值,但它也应该足够短快速适应计算速度的变化(例如竞争任务结束后的更多计算时间/更多带宽)
I worked on project needing ETA of a long, time-consuming computation and what I ended up doing was splitting the process in batches of the same size. Then, I time how long it takes to compute each batch and the time taken is added to a FIFO list of past computation times.
The times in the list are then averaged and the resulting time is multiplied by the amount of batches remaining.
Note that the list has a fixed length which should be long enough to let the estimation process "remember" and adjust to possible peaks in computation, but it should also be short enough to rapidly adapt to changes in computation speed (e.g. more computation time following the end of a competing task / more bandwidth)
我创建了一个用于计算 ETA 的项目。 您可以检查代码,或者如果您有 dotnet 项目,则可以使用 Nuget 包:CalculateETA
在线性迭代上,您可以测量自迭代开始以来经过的总时间,基本上您可以执行以下操作:
如果您使用多线程进程;
您还可以应用其他控制来防止结果或图表出现激增。
GitHub:
https://github.com/meokullu/CalculateETA/
NuGet(DotNet、C#):
https://www.nuget.org/packages/CalculateETA/
I created a project for calculation of ETA. You can check codes or if you have dotnet project you can use the Nuget package: CalculateETA
On linear iterations, you can measure total time passed since iteration started, and basically you can do:
If you use multi-thread process;
You may also apply additional controls to prevent surges on result or graph.
GitHub:
https://github.com/meokullu/CalculateETA/
NuGet (DotNet, C#):
https://www.nuget.org/packages/CalculateETA/
在 Python 中(另存为
timeleft.py
):输出(使用
python timeleft.py
运行):In Python (save as
timeleft.py
):Output (run with
python timeleft.py
):