确定给定代码的复杂性
给定一段代码,您将如何确定一般的复杂性。我发现自己对大O问题感到非常困惑。比如一个很简单的问题:
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
System.out.println("*");
}
}
TA用组合之类的东西解释了这一点。就像这样,n 选择 2 = (n(n-1))/2 = n^2 + 0.5,然后删除常数,使其变为 n^2。我可以输入 int 测试值并尝试,但是这个组合是如何进来的?
如果有 if 语句怎么办?复杂度是如何确定的?
for (int i = 0; i < n; i++) {
if (i % 2 ==0) {
for (int j = i; j < n; j++) { ... }
} else {
for (int j = 0; j < i; j++) { ... }
}
}
那么递归呢...
int fib(int a, int b, int n) {
if (n == 3) {
return a + b;
} else {
return fib(b, a+b, n-1);
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
一般来说,没有办法确定给定函数的复杂性
警告!文字墙来了!
1. 有一些非常简单的算法,没有人知道它们是否会停止。
如果给定特定输入,没有算法可以决定给定程序是否停止。计算计算复杂度是一个更加困难的问题,因为我们不仅需要证明算法会停止,而且需要证明它停止的速度有多快。
2. 一些算法具有奇怪且另类的复杂性
一个通用的“复杂性确定方案”很容易得到因为这些人太复杂了
3. 一些函数非常简单,但会混淆很多类型静态分析尝试
也就是说,我们仍然需要一种方法来发现事物的复杂性,对吧? For 循环是一种简单且常见的模式。以您最初的示例为例:
由于每个
打印某些内容
都是 O(1),因此算法的时间复杂度将取决于我们运行该行的次数。嗯,正如您的助教提到的,我们通过查看本例中的组合来做到这一点。内循环将运行 (N + (N-1) + ... + 1) 次,总共 (N+1)*N/2。由于我们忽略常量,我们得到 O(N2)。
现在,对于更棘手的情况,我们可以得到更多的数学知识。尝试创建一个函数,其值表示在给定输入大小 N 的情况下算法运行所需的时间。 通常我们可以直接从算法本身构造此函数的递归版本,因此计算复杂性就成为对该函数设置界限的问题。我们将此函数称为递归
例如:
很容易看出,以 N 为单位的运行时间将由下式给出:
好吧,T(N) 只是古老的斐波那契函数。我们可以使用归纳法来对此进行一些限制。
例如,让我们通过归纳证明,对于所有 N,T(N) <= 2^n(即 T(N) 是 O(2^n))
我们也可以尝试做类似的事情来证明下限)
在大多数情况下,对函数的最终运行时间有一个很好的猜测将允许您通过归纳轻松解决复发问题证明。当然,这需要您首先能够猜测 - 只有大量的练习才能帮助您。
最后一点,我想指出主定理,我现在能想到的唯一用于更困难的递归问题的常用规则。当您必须处理棘手的分而治之算法时,请使用它。
另外,在您的“if case”示例中,我将通过作弊并将其分成两个独立的循环来解决这个问题;里面没有 if 。
具有相同的运行时间
可以很容易地看出,两个部分中的每一部分都是 O(N^2),总共也是 O(N^2)。
请注意,我使用了一个很好的技巧来去掉这里的“if”。 这样做没有通用规则,如 Collatz 算法示例所示
In general, there is no way to determine the complexity of a given function
Warning! Wall of text incoming!
1. There are very simple algorithms that no one knows whether they even halt or not.
There is no algorithm that can decide whether a given program halts or not, if given a certain input. Calculating the computational complexity is an even harder problem since not only do we need to prove that the algorithm halts but we need to prove how fast it does so.
2. Some algorithms have weird and off-beat complexities
A general "complexity determining scheme" would easily get too complicated because of these guys
3. Some functions are very simple but will confuse lots of kinds of static analysis attempts
That said, we still need a way to find the complexity of stuff, right? For loops are a simple and common pattern. Take your initial example:
Since each
print something
is O(1), the time complexity of the algorithm will be determined by how many times we run that line. Well, as your TA mentioned, we do this by looking at the combinations in this case. The inner loop will run (N + (N-1) + ... + 1) times, for a total of (N+1)*N/2.Since we disregard constants we get O(N2).
Now for the more tricky cases we can get more mathematical. Try to create a function whose value represents how long the algorithm takes to run, given the size N of the input. Often we can construct a recursive version of this function directly from the algorithm itself and so calculating the complexity becomes the problem of putting bounds on that function. We call this function a recurrence
For example:
it is easy to see that the running time, in terms of N, will be given by
Well, T(N) is just the good-old Fibonacci function. We can use induction to put some bounds on that.
For, example, Lets prove, by induction, that T(N) <= 2^n for all N (ie, T(N) is O(2^n))
(we can try doing something similar to prove the lower bound too)
In most cases, having a good guess on the final runtime of the function will allow you to easily solve recurrence problems with an induction proof. Of course, this requires you to be able to guess first - only lots of practice can help you here.
And as f final note, I would like to point out about the Master theorem, the only rule for more difficult recurrence problems I can think of now that is commonly used. Use it when you have to deal with a tricky divide and conquer algorithm.
Also, in your "if case" example, I would solve that by cheating and splitting it into two separate loops that don; t have an if inside.
Has the same runtime as
And each of the two parts can be easily seen to be O(N^2) for a total that is also O(N^2).
Note that I used a good trick trick to get rid of the "if" here. There is no general rule for doing so, as shown by the Collatz algorithm example
一般来说,决定算法复杂度在理论上是不可能的。
然而,一种很酷且以代码为中心的方法是实际上直接从程序的角度进行思考。举个例子:
现在我们要分析它的复杂性,所以让我们添加一个简单的计数器来计算内部行的执行次数:
因为 System.out.println 行并不重要,所以让我们删除它:
现在我们只剩下计数器了,我们显然可以简化内部循环:
...因为我们知道增量恰好运行了n次。现在我们看到计数器增加了 n 正好 n 次,所以我们将其简化为:
我们得到了(正确的)O(n2< /sup>) 复杂性 :) 它就在代码中 :)
让我们看看递归斐波那契计算器是如何工作的:
更改例程,使其返回其中花费的迭代次数而不是实际的斐波那契数:
它仍然是斐波那契! :) 现在我们知道递归斐波那契计算器的复杂度为 O(F(n)),其中 F 是斐波那契数本身。
好吧,让我们看一些更有趣的东西,比如简单(且效率低下)的归并排序:
因为我们对实际结果不感兴趣,而是对复杂性感兴趣,所以我们更改例程,以便它实际上返回已执行的工作单元数:
然后我们删除那些实际上不影响计数的行并简化:
仍然简化一点:
我们现在实际上可以省去数组:
我们现在可以看到,实际上 from 和 to 的绝对值不再重要,重要的是它们的距离,所以我们将其修改为:
然后我们得到to:
显然,第一次调用时的 d 是要排序的数组的大小,因此您可以得到复杂性 M(x) 的递归式(这在第二行中显而易见:)
您需要解决这个问题才能获得封闭式解决方案。最简单的方法是猜测解 M(x) = x log x,并验证右侧:
并验证它渐近等价于左侧:
In general, deciding algorithm complexity is theoretically impossible.
However, one cool and code-centric method for doing it is to actually just think in terms of programs directly. Take your example:
Now we want to analyze its complexity, so let's add a simple counter that counts the number of executions of the inner line:
Because the System.out.println line doesn't really matter, let's remove it:
Now that we have only the counter left, we can obviously simplify the inner loop out:
... because we know that the increment is run exactly n times. And now we see that counter is incremented by n exactly n times, so we simplify this to:
And we emerged with the (correct) O(n2) complexity :) It's there in the code :)
Let's look how this works for a recursive Fibonacci calculator:
Change the routine so that it returns the number of iterations spent inside it instead of the actual Fibonacci numbers:
It's still Fibonacci! :) So we know now that the recursive Fibonacci calculator is of complexity O(F(n)) where F is the Fibonacci number itself.
Ok, let's look at something more interesting, say simple (and inefficient) mergesort:
Because we are not interested in the actual result but the complexity, we change the routine so that it actually returns the number of units of work carried out:
Then we remove those lines that do not actually impact the counts and simplify:
Still simplifying a bit:
We can now actually dispense with the array:
We can now see that actually the absolute values of from and to do not matter any more, but only their distance, so we modify this to:
And then we get to:
Here obviously d on the first call is the size of the array to be sorted, so you have the recurrence for the complexity M(x) (this is in plain sight on the second line :)
and this you need to solve in order to get to a closed form solution. This you do easiest by guessing the solution M(x) = x log x, and verify for the right side:
and verify it is asymptotically equivalent to the left side:
尽管这是一种过度概括,但我喜欢将 Big-O 视为列表,其中列表的长度是 N 项。
因此,如果您有一个 for 循环来迭代列表中的所有内容,则其复杂度为 O(N)。在您的代码中,有一行(单独而言)为 0(N)。
如果您有一个嵌套在另一个 for 循环中的 for 循环,并且您对列表中的每个项目执行操作,要求您查看列表中的每个项目,那么您将为 N 个项目中的每个项目执行 N 次操作,因此O(N^2)。在上面的示例中,您实际上在 for 循环中嵌套了另一个 for 循环。所以你可以把它想象成每个 for 循环都是 0(N),然后因为它们是嵌套的,所以将它们相乘得到总值为 0(N^2)。
相反,如果您只是对单个项目进行快速操作,那么时间复杂度为 O(1)。没有“长度为 n 的列表”需要遍历,只有一个一次性操作。为了将其放在上下文中,在上面的示例中,操作:
是 0(1)。重要的不是“if”,而是检查单个项目是否等于另一个项目是对单个项目的快速操作。与之前一样,if 语句嵌套在外部 for 循环内。但是,因为它是 0(1),所以您将所有内容乘以“1”,因此在整个函数的运行时间的最终计算中没有“明显”影响。
对于日志,以及处理更复杂的情况(比如计数到 j 或 i,而不仅仅是 n),我会向您指出一个更优雅的解释 此处。
Even though this is an over generalization, I like to think of Big-O in terms of lists, where the length of the list is N items.
Thus, if you have a for-loop that iterates over everything in the list, it is O(N). In your code, you have one line that (in isolation all by itself) is 0(N).
If you have a for loop nested inside another for loop, and you perform an operation on each item in the list that requires you to look at every item in the list, then you are doing an operation N times for each of N items, thus O(N^2). In your example above you do in fact, have another for loop nested inside your for loop. So you can think about it as if each for loop is 0(N), and then because they are nested, multiply them together for a total value of 0(N^2).
Conversely, if you are just doing a quick operation on a single item then that would be O(1). There is no 'list of length n' to go over, just a single one time operation.To put this in context, in your example above, the operation:
is 0(1). What is important isn't the 'if', but the fact that checking to see if a single item is equal to another item is a quick operation on a single item. Like before, the if statement is nested inside your external for loop. However, because it is 0(1), then you are multiplying everything by '1', and so there is no 'noticeable' affect in your final calculation for the run time of the entire function.
For logs, and dealing with more complex situations (like this business of counting up to j or i, and not just n again), I would point you towards a more elegant explanation here.
我喜欢使用两种方式表示 Big-O 表示法:标准 Big-O(最坏情况)和平均 Big-O(通常最终会发生的情况)。它还帮助我记住 Big-O 表示法试图将运行时间近似为 N(输入数量)的函数。
正如我所说,正常的大 O 是最坏的情况。您可以尝试计算每行执行的次数,但更简单的是,只看第一个示例,并说在 n 的长度上有两个循环,一个嵌入另一个循环,因此它是 n *名词如果它们依次排列,则为 n + n,等于 2n。由于它是一个近似值,因此您只需说 n 或线性即可。
对我来说,平均情况和最佳情况对于组织我的想法有很大帮助。在最坏的情况下,你会忽略 if 并说 n^2。在平均情况下,对于您的示例,您有一个对 n 的循环,另一个对 n 的一部分的循环发生一半的时间。这给你 n * n/x/2 (x 是 n 在嵌入循环中循环的任何分数。这给你 n^2/(2x),所以你会得到 n^2 一样。这是因为它是一个近似值。
我知道这不是您问题的完整答案,但希望它能对代码中的近似复杂性有所启发,
正如我上面的答案中所说,显然不可能确定。这适用于所有片段代码;我只是想在讨论中添加使用平均情况 Big-O 的想法。
I like to use two things for Big-O notation: standard Big-O, which is worst case scenario, and average Big-O, which is what normally ends up happening. It also helps me to remember that Big-O notation is trying to approximate run-time as a function of N, the number of inputs.
As I said, normal big-O is worst case scenario. You can try to count the number of times that each line gets executed, but it is simpler to just look at the first example and say that there are two loops over the length of n, one embedded in the other, so it is n * n. If they were one after another, it'd be n + n, equaling 2n. Since its an approximation, you just say n or linear.
This is where for me having average case and best case helps a lot for organizing my thoughts. In worst case, you ignore the if and say n^2. In average case, for your example, you have a loop over n, with another loop over part of n that happens half of the time. This gives you n * n/x/2 (the x is whatever fraction of n gets looped over in your embedded loops. This gives you n^2/(2x), so you'd get n^2 just the same. This is because its an approximation.
I know this isn't a complete answer to your question, but hopefully it sheds some kind of light on approximating complexities in code.
As has been said in the answers above mine, it is clearly not possible to determine this for all snippets of code; I just wanted to add the idea of using average case Big-O to the discussion.
对于第一个片段,它只是 n^2,因为您执行了 n 次操作。如果
j
初始化为i
,或上升到i
,您发布的解释会更合适,但就目前情况而言,情况并非如此。对于第二个片段,您可以轻松地看到第一个片段将在一半的时间内执行,而另一半的时间将执行第二个片段。根据其中的内容(希望它依赖于 n),您可以将方程重写为递归方程。
递归方程(包括第三个片段)可以写成这样:第三个方程将显示为
我们可以很容易地看到它是 O(n)。
For the first snippet, it's just n^2 because you perform n operations n times. If
j
was initialized toi
, or went up toi
, the explanation you posted would be more appropriate but as it stands it is not.For the second snippet, you can easily see that half of the time the first one will be executed, and the second will be executed the other half of the time. Depending on what's in there (hopefully it's dependent on
n
), you can rewrite the equation as a recursive one.The recursive equations (including the third snippet) can be written as such: the third one would appear as
Which we can easily see is O(n).
Big-O 只是一个近似值,它没有说明算法执行需要多长时间,它只是说明当输入大小增加时需要多长时间。
因此,如果输入大小为 N,并且算法计算常量复杂度的表达式:O(1) N 次,则算法的复杂度是线性的:O(N)。如果表达式具有线性复杂度,则算法具有二次复杂度:O(N*N)。
某些表达式具有指数复杂度:O(N^N) 或对数复杂度:O(log N)。对于具有循环和递归的算法,将每个级别的循环和/或递归的复杂性相乘。就复杂性而言,循环和递归是等效的。一个算法在算法的不同阶段有不同的复杂度,选择最高复杂度的,忽略其余的。最后,所有常数复杂度都被认为是等效的:O(5) 与 O(1) 相同,O(5*N) 与 O(N) 相同。
Big-O is just an approximation, it doesn't say how long an algorithm takes to execute, it just says something about how much longer it takes when the size of its input grows.
So if the input is size N and the algorithm evaluates an expression of constant complexity: O(1) N times, the complexity of the algorithm is linear: O(N). If the expression has linear complexity, the algorithm has quadratic complexity: O(N*N).
Some expressions have exponential complexity: O(N^N) or logarithmic complexity: O(log N). For an algorithm with loops and recursion, multiply the complexities of each level of loop and/or recursion. In terms of complexity, looping and recursion are equivalent. An algorithm that has different complexities at different stages in the algorithm, choose the highest complexity and ignore the rest. And finally, all constant complexities are considered equivalent: O(5) is the same as O(1), O(5*N) is the same as O(N).