当前位置：文江博客话题详情

字符串匹配算法的大 O 表示法

发布于 2024-11-16 03:32:12 字数 471 浏览 4 评论 0 原文

函数 foo 的大 O 表示法是什么？

int foo(char *s1, char *s2)
{
   int c=0, s, p, found;
   for (s=0; s1[s] != '\0'; s++)
   {
      for (p=0, found=0; s2[p] != '\0'; p++)
      {
         if (s2[p] == s1[s])
         {
            found = 1;
            break;
         }
      }
      if (!found) c++;
   }
   return c;
}

函数 foo 的效率是多少？

a) O(n!)

b) O(n^2)

c) O(n lg(base2) n )

d) O(n)

我会说 O(MN)...？

原文

What would the big O notation of the function foo be?

int foo(char *s1, char *s2)
{
   int c=0, s, p, found;
   for (s=0; s1[s] != '\0'; s++)
   {
      for (p=0, found=0; s2[p] != '\0'; p++)
      {
         if (s2[p] == s1[s])
         {
            found = 1;
            break;
         }
      }
      if (!found) c++;
   }
   return c;
}

What is the efficiency of the function foo?

a) O(n!)

b) O(n^2)

c) O(n lg(base2) n )

d) O(n)

I would have said O(MN)...?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

心病无药医 2024-11-23 03:32:12

它是O(n²)，其中n = max(length(s1),length(s2))（可以在小于二次方的时间内确定 - 见下文）。我们看一下课本上的定义：

f(n) ∈ O(g(n)) 如果存在正实数 c 和正整数 N，使得 f(n) <= cg(n) 对于所有 n >= N

通过这个定义，我们看到 n代表一个数字 - 在这种情况下，数字是传入字符串的长度。但是，存在明显的差异，因为此定义仅提供单个变量函数 f(n) ，而这里我们明确传入2个长度独立的字符串。因此，我们寻找 Big O 的多变量定义。然而，正如 Howell 在 “关于多变量的渐近表示法”：

“不可能以隐含所有这些[通常假设的]属性的方式为多变量函数定义大 O 表示法。”

实际上，具有多个变量的 Big O 有一个正式的定义，但这需要额外的约束单变量 Big O 得到满足，并且超出了大多数（如果不是全部）算法课程的范围。对于典型的算法分析，我们可以通过将所有变量绑定到限制变量n来有效地将函数简化为单个变量。在这种情况下，变量（具体来说，length(s1) 和 length(s2)）显然是独立的，但可以绑定它们：

方法 1

Let x1 = length(s1)
Let x2 = length(s2)

当存在以下情况时，会发生此函数的最坏情况：没有匹配项，因此我们执行 x1 * x2 迭代。

因为乘法是可交换的，所以最坏情况 foo(s1,s2) == foo(s2,s1) 最坏情况。因此，不失一般性，我们可以假设 x1 >= x2。（这是因为，如果 x1 < x2，我们可以通过以相反的顺序传递参数来获得相同的结果）。

方法2（如果您不喜欢第一种方法）

对于最坏的情况（s1和s2不包含公共字符），我们可以确定length(s1)和length(s2)在迭代循环之前（在 .NET 和 Java 中，确定字符串的长度是 O(1) - 但在本例中是 O(n)），将较大的值分配给 x1，将较小的值分配给 x2。这里很明显x1>=x2。

对于这种情况，我们将看到确定 x1 和 x2 的额外计算使得 O(n² + 2n) 我们使用以下简化规则可以在此处找到以简化为 O(n²)：

如果f(x)是多项之和，则保留增长率最大的项，其他项被忽略。

结论

对于n = x1（我们的限制变量），使得x1 >= x2，最坏的情况是x1 = x2 。因此：f(x1) ∈ O(n²)

额外提示

对于所有发布到与大 O 表示法相关的 SO 的作业问题，如果答案不是以下之一：

O(1)
O(log log n)
O(log n)
O(n^c), 0<c<1
O(n)
O(n log n) = O(log n!)
O(n^2)
O(n^c)
O(c^n)
O(n!)

那么问题可能最好发布到 https://math.stackexchange.com/

It is O(n²) where n = max(length(s1),length(s2)) (which can be determined in less than quadratic time - see below). Let's take a look at a textbook definition:

f(n) ∈ O(g(n)) if a positive real number c and positive integer N exist such that f(n) <= c g(n) for all n >= N

By this definition we see that n represents a number - in this case that number is the length of the string passed in. However, there is an apparent discrepancy, since this definition provides only for a single variable function f(n) and here we clearly pass in 2 strings with independent lengths. So we search for a multivariable definition for Big O. However, as demonstrated by Howell in "On Asymptotic Notation with Multiple Variables":

"it is impossible to define big-O notation for multi-variable functions in a way that implies all of these [commonly-assumed] properties."

There is actually a formal definition for Big O with multiple variables however this requires extra constraints beyond single variable Big O be met, and is beyond the scope of most (if not all) algorithms courses. For typical algorithm analysis we can effectively reduce our function to a single variable by bounding all variables to a limiting variable n. In this case the variables (specifically, length(s1) and length(s2)) are clearly independent, but it is possible to bound them:

Method 1

Let x1 = length(s1)
Let x2 = length(s2)

The worst case scenario for this function occurs when there are no matches, therefore we perform x1 * x2 iterations.

Because multiplication is commutative, the worst case scenario foo(s1,s2) == the worst case scenario of foo(s2,s1). We can therefore assume, without loss of generality, that x1 >= x2. (This is because, if x1 < x2 we could get the same result by passing the arguments in the reverse order).

Method 2 (in case you don't like the first method)

For the worst case scenario (in which s1 and s2 contain no common characters), we can determine length(s1) and length(s2) prior to iterating through the loops (in .NET and Java, determining the length of a string is O(1) - but in this case it is O(n)), assigning the greater to x1 and the lesser to x2. Here it is clear that x1 >= x2.

For this scenario, we will see that the extra calculations to determine x1 and x2 make this O(n² + 2n) We use the following simplification rule which can be found here to simplify to O(n²):

If f(x) is a sum of several terms, the one with the largest growth rate is kept, and all others omitted.

Conclusion

for n = x1 (our limiting variable), such that x1 >= x2, the worst case scenario is x1 = x2.
Therefore: f(x1) ∈ O(n²)

Extra Hint

For all homework problems posted to SO related to Big O notation, if the answer is not one of:

O(1)
O(log log n)
O(log n)
O(n^c), 0<c<1
O(n)
O(n log n) = O(log n!)
O(n^2)
O(n^c)
O(c^n)
O(n!)

Then the question is probably better off being posted to https://math.stackexchange.com/

回复收藏 0 原文

第几種人 2024-11-23 03:32:12

在大 O 表示法中，我们总是必须定义出现的变量的含义。除非我们定义 n 是什么，否则 O(n) 没有任何意义。通常，我们可以省略这些信息，因为它从上下文中很清楚。例如，如果我们说某个排序算法是 O(n log(n))，则 n 始终表示要排序的项目数，因此我们不必始终声明这一点。

大 O 表示法的另一个重要之处是它只给出了一个上限 - O(n) 中的每个算法也都是 O(n^2) 中。该符号经常被用来表示“算法具有由表达式给出的精确渐近复杂度（最多一个常数因子）”，但它的实际定义是“算法的复杂度受给定表达式（最多一个常数因子）的限制”因素）”。

在您给出的示例中，您将 m 和 n 作为两个字符串各自的长度。根据这个定义，算法确实是O(mn)。如果我们将 n 定义为两个字符串中较长的一个的长度，我们也可以将其写为 O(n^2) ——这也是一个上限算法复杂度的限制。并且使用相同的 n 定义，算法也是 O(n!)，但不是 O(n) 或 O (n log(n))。