字符串匹配算法的大 O 表示法
函数 foo 的大 O 表示法是什么?
int foo(char *s1, char *s2)
{
int c=0, s, p, found;
for (s=0; s1[s] != '\0'; s++)
{
for (p=0, found=0; s2[p] != '\0'; p++)
{
if (s2[p] == s1[s])
{
found = 1;
break;
}
}
if (!found) c++;
}
return c;
}
函数 foo 的效率是多少?
a) O(n!)
b) O(n^2)
c) O(n lg(base2) n )
d) O(n)
我会说 O(MN)...?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
它是
O(n²)
,其中n = max(length(s1),length(s2))(可以在小于二次方的时间内确定 - 见下文)。我们看一下课本上的定义:通过这个定义,我们看到 n代表一个数字 - 在这种情况下,数字是传入字符串的长度。但是,存在明显的差异,因为此定义仅提供单个变量函数
f(n)
,而这里我们明确传入2个长度独立的字符串。因此,我们寻找 Big O 的多变量定义。然而,正如 Howell 在 “关于多变量的渐近表示法”:实际上,具有多个变量的 Big O 有一个正式的定义,但这需要额外的约束单变量 Big O 得到满足,并且超出了大多数(如果不是全部)算法课程的范围。对于典型的算法分析,我们可以通过将所有变量绑定到限制变量
n
来有效地将函数简化为单个变量。在这种情况下,变量(具体来说,length(s1) 和 length(s2))显然是独立的,但可以绑定它们:方法 1
当存在以下情况时,会发生此函数的最坏情况 :没有匹配项,因此我们执行 x1 * x2 迭代。
因为乘法是可交换的,所以最坏情况 foo(s1,s2) == foo(s2,s1) 最坏情况。因此,不失一般性,我们可以假设 x1 >= x2。 (这是因为,如果 x1 < x2,我们可以通过以相反的顺序传递参数来获得相同的结果)。
方法2(如果您不喜欢第一种方法)
对于最坏的情况(s1和s2不包含公共字符),我们可以确定length(s1)和length(s2)在迭代循环之前(在 .NET 和 Java 中,确定字符串的长度是 O(1) - 但在本例中是 O(n)),将较大的值分配给 x1,将较小的值分配给 x2。这里很明显x1>=x2。
对于这种情况,我们将看到确定 x1 和 x2 的额外计算使得 O(n² + 2n) 我们使用以下简化规则 可以在此处找到以简化为 O(n²):
结论
对于
n = x1
(我们的限制变量),使得x1 >= x2
,最坏的情况是x1 = x2 。
因此:
f(x1) ∈ O(n²)
额外提示
对于所有发布到与大 O 表示法相关的 SO 的作业问题,如果答案不是以下之一:
那么问题可能最好发布到 https://math.stackexchange.com/
It is
O(n²)
where n = max(length(s1),length(s2)) (which can be determined in less than quadratic time - see below). Let's take a look at a textbook definition:By this definition we see that n represents a number - in this case that number is the length of the string passed in. However, there is an apparent discrepancy, since this definition provides only for a single variable function
f(n)
and here we clearly pass in 2 strings with independent lengths. So we search for a multivariable definition for Big O. However, as demonstrated by Howell in "On Asymptotic Notation with Multiple Variables":There is actually a formal definition for Big O with multiple variables however this requires extra constraints beyond single variable Big O be met, and is beyond the scope of most (if not all) algorithms courses. For typical algorithm analysis we can effectively reduce our function to a single variable by bounding all variables to a limiting variable
n
. In this case the variables (specifically, length(s1) and length(s2)) are clearly independent, but it is possible to bound them:Method 1
The worst case scenario for this function occurs when there are no matches, therefore we perform x1 * x2 iterations.
Because multiplication is commutative, the worst case scenario foo(s1,s2) == the worst case scenario of foo(s2,s1). We can therefore assume, without loss of generality, that x1 >= x2. (This is because, if x1 < x2 we could get the same result by passing the arguments in the reverse order).
Method 2 (in case you don't like the first method)
For the worst case scenario (in which s1 and s2 contain no common characters), we can determine length(s1) and length(s2) prior to iterating through the loops (in .NET and Java, determining the length of a string is O(1) - but in this case it is O(n)), assigning the greater to x1 and the lesser to x2. Here it is clear that x1 >= x2.
For this scenario, we will see that the extra calculations to determine x1 and x2 make this O(n² + 2n) We use the following simplification rule which can be found here to simplify to O(n²):
Conclusion
for
n = x1
(our limiting variable), such thatx1 >= x2
, the worst case scenario isx1 = x2
.Therefore:
f(x1) ∈ O(n²)
Extra Hint
For all homework problems posted to SO related to Big O notation, if the answer is not one of:
Then the question is probably better off being posted to https://math.stackexchange.com/
在大 O 表示法中,我们总是必须定义出现的变量的含义。除非我们定义
n
是什么,否则O(n)
没有任何意义。通常,我们可以省略这些信息,因为它从上下文中很清楚。例如,如果我们说某个排序算法是O(n log(n))
,则n
始终表示要排序的项目数,因此我们不必始终声明这一点。大 O 表示法的另一个重要之处是它只给出了一个上限 -
O(n)
中的每个算法也都是O(n^2)
中。该符号经常被用来表示“算法具有由表达式给出的精确渐近复杂度(最多一个常数因子)”,但它的实际定义是“算法的复杂度受给定表达式(最多一个常数因子)的限制”因素)”。在您给出的示例中,您将
m
和n
作为两个字符串各自的长度。根据这个定义,算法确实是O(mn)
。如果我们将n
定义为两个字符串中较长的一个的长度,我们也可以将其写为O(n^2)
——这也是一个上限算法复杂度的限制。并且使用相同的n
定义,算法也是O(n!)
,但不是O(n)
或O (n log(n))
。In big-O notation, we always have to define what the occuring variables mean.
O(n)
doesn't mean anything unless we define whatn
is. Often, we can omit this information because it is clear from context. For example if we say that some sorting algorithm isO(n log(n))
,n
always denotes the number of items to sort, so we don't have to always state this.Another important thing about big-O notation is that it only gives an upper limit -- every algorithm in
O(n)
is also inO(n^2)
. The notation is often used as meaning "the algorithm has the exact asymptotic complexity given by the expression (up to a constant factor)", but it's actual definition is "the complexity of the alogrithm is bounded by the given expression (up to a constant factor)".In the example you gave, you took
m
andn
to be the respective lengths of the two strings. With this definition, the algorithm is indeedO(m n)
. If we definen
to be the length of the longer of the two strings though, we can also write this asO(n^2)
-- this is also an upper limit for the complexity of the algorithm. And with the same definition ofn
, the algorithm is alsoO(n!)
, but notO(n)
orO(n log(n))
.O(n^2)
就复杂性而言,函数的相关部分是嵌套循环。最大迭代次数是s1的长度乘以s2的长度,两者都是线性因子,因此最坏情况的计算时间是O(n^2),即线性因子的平方。正如 Ethan 所说,O(mn) 和 O(n^2) 实际上是同一件事。
O(n^2)
The relevant part of the function, in terms of complexity, is the nested loops. The maximum number of iterations is the length of s1 times the length of s2, both of which are linear factors, so the worst-case computing time is O(n^2), i.e. the square of a linear factor. As Ethan said, O(mn) and O(n^2) are effectively the same thing.
这样想:
有两个输入。如果函数只是返回,那么它的性能与参数无关。这将是 O(1)。
如果该函数循环遍历一个字符串,则性能与该字符串的长度呈线性相关。因此O(N)。
但该函数有一个循环内的循环。性能与s1的长度和S2的长度有关。将这些长度相乘即可得到循环迭代的次数。它不再是线性的,而是遵循曲线。这是 O(N^2)。
Think of it this way:
There are two inputs. If the function simply returned, then it's performance is unrelated to the arguments. This would be O(1).
If the function looped over one string, then the performance is linearly related to the length of that string. Therefore O(N).
But the function has a loop within a loop. The performance is related to the length of s1 and the length of S2. Multiply those lengths together and you get the number of loop iterations. It's not linear any more, it follows a curve. This is O(N^2).