一种估计程序运行时间的算法
我需要找到程序在不同输入上执行的总时间。 该程序读取一些数据并将其写入另一个文件。 数据值的值和数据的大小每次都不同。
我想知道所有大小的数据通常需要多长时间。
找到这个的算法是基于程序单次执行的总时间吗?
例如,如果我知道,
for single execution
a.program - execution time 1.2sec
- its create file 100 kb file
我能否找出不同数据大小下 n 次执行需要多长时间?
I need to find the total timings for the execution of a program over different inputs. The program reads some data and writes it into another file. The values of the data value and the size of the data are different every time.
I want to find how long it will take in general for all size of data.
Is the algorithm for finding this based on the total timings of the program for a single execution?
For Example, if I know
for single execution
a.program - execution time 1.2sec
- its create file 100 kb file
Can I find out how long it will take for n executions, on a different data size?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
计算机科学有一整个分支致力于此:算法复杂性。 简而言之,您可以分析代码以确定其具有哪些性能特征,例如,大多数好的排序算法的平均运行时间为
O(n log n)
,其中n
是要排序的数组的大小。 这意味着对100
项进行排序的时间长度为100 * ln 100 / ln 2 * x
或664x
,其中x 是运行一条指令所需的时间量。
There is an entire branch of Computer Science devoted to this: Algorithmic Complexity. In short, you analyze the code to determine what performance characteristics it has, for instance, most good sorting algorithms have an average runtime of
O(n log n)
wheren
is the size of the array to be sorted. This means the length of time to sort100
items is100 * ln 100 / ln 2 * x
or664x
wherex
is the amount of time needed to run one instruction.从问题来看,我不确定您是在运行该程序之前尝试计算执行时间,还是记录脚本运行所需的时间。 如果您试图预先计算,我同意其他答案,认为这是不可能的。
如果您想记录执行时间,只需在程序中添加 2 个全局日期变量,在执行开始时立即将当前日期和时间存储在一个变量中,然后在终止时将时间存储在第二个变量中。 使用日期差函数可以得到经过的秒数(或所需的时间单位)。
From the question, I am not sure you are trying to calculate execution time before running this program, or to record the time it takes your script to run. If you are trying to pre-calculate, I agree with the other answers that say it is not possible.
If you want to record your execution time, simply add 2 global date variables to your program, store the current date and time in one immediately at the start of execution, and then store the time in the second variable upon termination. Use a date difference function to give you the seconds (or your desired time unit) elapsed.
很难确定,但我不认为这是停止问题的一个版本。 (或者,至少,它可以被简化,这样就不是)。
我认为,您只是要求估计一系列读取和写入将花费多长时间......但读取和写入的量不同。
最简单的方法是进行一些经验测量(越多越好),并使用这些测量来估计未来运行需要多长时间。 如果您发现读取 10 MB 的数据需要 10 秒,那么您可以估计读取 100 MB 的数据可能需要大约 100 秒。 (当然,这假设您正在查看 O(n) 算法...如果不是,您将必须进行相应调整。)
当然,这很容易出错,因为其他原因系统上正在进行...但是,根据您的需要,它可能会为您提供足够好的估计。
当然,如果您能够在阅读/写作时更新您的估计,您就可以改进它们。
It's hard to know for sure, but I don't think this is a version of the halting problem. (or, at least, it can be simplified so that it's not).
I think, you're just asking for estimates about how long a series of reads and writes will take...but with varied amounts of reading and writing.
The easiest way to do it is to make some empirical measurements (the more the better) and use those measurements to make estimates about how long future runs will take. If you discover that reading 10 MB of data takes 10 seconds, then you could estimate that 100 MB might take about 100 seconds. (This, of course, makes the assumption that you're looking at an O(n) algorithm...if not, you'll have to adjust accordingly.)
Of course, this will be prone to error, because of whatever else is going on on the system...but, depending on your needs, it may give you estimates that are good enough.
Of course, if you are able to update your estimates as you are reading/writing, you can improve them.
如果您询问实际的解决方案,请使用基准模块或其他记录时间的过程。 然后针对多个输入的输入大小绘制执行时间并进行插值(但要注意外推,如该 xkcd 卡通所示< /a>)。
如果您想了解该理论,您需要了解“计算复杂性”,这是一个很好的知识搜索术语帮助您入门。
例如,如果您运行一次数据,那么通常两倍的数据将花费大约两倍的时间。 最好的搜索算法通常需要 O(NlnN),因此两倍的数据将花费略多于两倍的时间。 但即使这些也只能限制时间长度,并且常数将取决于磁盘访问、内存、运行的其他程序等。
If you are asking about the practical solution, then using the Benchmark module, or some other process where you record the time. Then plot the execution time against the input size for a number of inputs and interpolate (but beware extrapolation, as this xkcd cartoon demonstrates).
If you want to know about the theory you need to understand "computational complexity" which is a good search term to get you started.
For example, if you run over the data once, then usually twice as much data will take roughly twice as long. The best search algorithms typically take O(NlnN) so twice as much data will take slightly more than twice as long. But even these only give you limits on the length of time, and the constants will depend on things like disk access, memory, other progams running, etc.
您必须知道您的程序是否停止。 它不能自动被需要,但您可以确定您是否了解其设计。 那么你至少要知道你的程序的渐近复杂度。 如果您知道真正的复杂性公式,那就更好了。 然后您可以对足够的输入集进行基准测试。 然后您可以对数据进行插值以获得常量。 最后只需将常数代入方程并计算即可。 很容易,不是吗? ;-)
You have to know if your program halts. It can't be automatically desired but you can determine if you know its design. Then you have to know at least your program asymptotic complexity. Better if you know real complexity formula. Then you can benchmark for adequate set of inputs. Then you can interpolate your data to achieve constants. Finally just place constant to equation and compute. Easy, isn't it? ;-)
如果你可以重新执行你的程序。 你可以使用unix“time”命令来计时。 如果没有,您需要保存系统时间,然后在程序结束时再次保存并将其打印出来?
If you can reexecute your program. You can time it using unix "time" command. If not, you need to save System time, and then again at end of program and print it out?
我不太明白你的问题,但我相信你要问的是如何在运行程序之前计算出程序的执行时间。
这与停止问题有关。 停机问题很棘手。
如果我误解了你的问题,我深表歉意。
编辑:为了回应您的澄清,对于外推运行时以获取更大的输入,没有通用算法较小输入的运行时间。 算法分析是一件非常棘手的事情。 您可以使用一些启发式方法。 例如,您可以计算不同“大小”(例如 10、100、1000、10000)的输入的运行时间,并尝试将曲线拟合到函数“大小”-> 运行。
I don't quite understand your question, but I believe that what you're asking is how to figure out the execution time of a program in advance of running it.
This is related to the halting problem. The halting problem is intractable.
Apologies if I am misunderstanding your question.
Edit: To respond to your clarification, there is no general algorithm for extrapolating runtime for larger inputs from runtime for smaller inputs. Analysis of algorithms is very tricky business. There are heuristics that you can use. For example, you could compute runtime on inputs of varying "size" (say, 10, 100, 1000, 10000) and try to fit a curve to the function "size" -> runtime.
对此没有完美的算法,并且变化很大 - 第一次运行可能非常慢,但第二次运行会快得多,因为磁盘中的数据缓存在内存中。
您可以使用各种输入来测量程序,并尝试构建输入大小/复杂性与执行时间相比的模型,然后使用该结果来估计执行时间。
There is no perfect algorithm for this, and it will be highly variable - a first run may be very slow, but a second run will be much quicker since the data from disk is cached in memory.
You could measure your program with a variety of inputs and try to build a model for input size/complexity compared to execution time and then use the results of that to estimate execution time.
如果我理解正确的话,您想了解一个进程执行需要多长时间。
如果是这种情况,请使用 Benchmark 模块。
使用它,您可以在不同的地方进行计时并判断程序不同部分的计时。
If I understand you correctly, you want to find out how long it takes a process to execute.
If this is the case then use the Benchmark module.
Using this you can take timings at different places and judge the timings of different parts of your program.
1、停机问题是棘手的,即即使你有数据和程序,也无法告诉(!在一般情况下!)运行它需要多长时间。
2、停机问题中的“程序”指的是完整的状态,例如你的程序+它处理的数据。
所以这是两次棘手的事情:)
1, Halting problem is intractable, i.e. even you have the data and the program, there is no way to tell (! in the general case !) how long it will take to run it.
2, The 'program' in the halting problem means the complete state, e.g. your program + the data it processes.
So it's two times intractable :)
如果您知道算法的渐近运行时间,例如知道排序算法是 n*log(n),您可以在小输入上运行它并计算(尽管可能只是一个范围)对于较大输入的情况。 问题是分析大多数算法都非常困难。 否则,您可以在较小的输入上运行几次并执行某种类型的回归< /a> (可能是非线性的)发现/近似算法性能特征的方程,并使用它来计算更大的输入。
If you know the asymptotic running time of your algorithm, for example knowing a sorting algorithm is n*log(n), you can run it on small inputs and calculate (though perhaps only a range) what it would be for larger inputs. The problem is that analyzing most algorithms is very hard. Otherwise, you could run it a few times on smaller input and do some type of regression (probably nonlinear) to discover/approximate an equation for the algorithm's performance characteristics, and use that to calculate for larger inputs.