如何计算更复杂算法的阶数(大O)(例如快速排序)

发布于 2024-08-28 14:08:30 字数 923 浏览 6 评论 0 原文

我知道有很多关于大 O 表示法的问题,我已经检查过:

仅举几例。

我凭“直觉”知道如何计算 nn^2n! 等,但是我完全不知道如何计算计算 log nn log nn log log n 等算法。

我的意思是,我知道快速排序是n log n(平均而言)..但是,为什么?合并/梳子等也是如此。

有人能用不太数学的方式向我解释一下你如何计算这个吗?

主要原因是我即将接受一次大型采访,我很确定他们会要求这类东西。我已经研究了几天了,每个人似乎都对为什么冒泡排序是 n^2 有解释,或者在 维基百科

I know there are quite a bunch of questions about big O notation, I have already checked:

to name a few.

I know by "intuition" how to calculate it for n, n^2, n! and so, however I am completely lost on how to calculate it for algorithms that are log n , n log n, n log log n and so.

What I mean is, I know that Quick Sort is n log n (on average).. but, why? Same thing for merge/comb, etc.

Could anybody explain me in a not too math-y way how do you calculate this?

The main reason is that Im about to have a big interview and I'm pretty sure they'll ask for this kind of stuff. I have researched for a few days now, and everybody seem to have either an explanation of why bubble sort is n^2 or the unreadable explanation (for me) on Wikipedia

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

夏见 2024-09-04 14:08:31

对数是指数的逆运算。求幂的一个例子是在每一步将项目数加倍。因此,对数算法通常将每一步的项目数量减半。例如,二分查找就属于这一类。

许多算法需要对数个大步骤,但每个大步骤都需要 O(n) 个工作单元。归并排序就属于这一类。

通常,您可以通过将此类问题可视化为平衡二叉树来识别它们。例如,这是合并排序:

 6   2    0   4    1   3     7   5
  2 6      0 4      1 3       5 7
    0 2 4 6            1 3 5 7
         0 1 2 3 4 5 6 7

顶部是输入,作为树的叶子。该算法通过对上面的两个节点进行排序来创建一个新节点。我们知道平衡二叉树的高度为 O(log n),因此有 O(log n) 个大步长。然而,创建每个新行需要 O(n) 的工作。 O(n) 工作的 O(log n) 个大步骤意味着归并排序总体上是 O(n log n) 。

一般来说,O(log n) 算法如下所示。他们在每一步都会丢弃一半的数据。

def function(data, n):
    if n <= constant:
       return do_simple_case(data, n)
    if some_condition():
       function(data[:n/2], n / 2) # Recurse on first half of data
    else:
       function(data[n/2:], n - n / 2) # Recurse on second half of data

虽然 O(n log n) 算法看起来像下面的函数。他们还将数据分成两半,但他们需要考虑两半。

def function(data, n):
    if n <= constant:
       return do_simple_case(data, n)
    part1 = function(data[n/2:], n / 2)      # Recurse on first half of data
    part2 = function(data[:n/2], n - n / 2)  # Recurse on second half of data
    return combine(part1, part2)

其中 do_simple_case() 花费 O(1) 时间,而 merge() 花费不超过 O(n) 时间。

该算法不需要将数据精确地分成两半。可以分成三分之一和三分之二,那就可以了。对于平均情况的性能,将其平均分成两半就足够了(如快速排序)。只要递归是在 (n/something) 和 (n - n/something) 的片段上完成的,就可以了。如果将其分解为 (k) 和 (nk),那么树的高度将是 O(n) 而不是 O(log n)。

The logarithm is the inverse operation of exponentiation. An example of exponentiation is when you double the number of items at each step. Thus, a logarithmic algorithm often halves the number of items at each step. For example, binary search falls into this category.

Many algorithms require a logarithmic number of big steps, but each big step requires O(n) units of work. Mergesort falls into this category.

Usually you can identify these kinds of problems by visualizing them as a balanced binary tree. For example, here's merge sort:

 6   2    0   4    1   3     7   5
  2 6      0 4      1 3       5 7
    0 2 4 6            1 3 5 7
         0 1 2 3 4 5 6 7

At the top is the input, as leaves of the tree. The algorithm creates a new node by sorting the two nodes above it. We know the height of a balanced binary tree is O(log n) so there are O(log n) big steps. However, creating each new row takes O(n) work. O(log n) big steps of O(n) work each means that mergesort is O(n log n) overall.

Generally, O(log n) algorithms look like the function below. They get to discard half of the data at each step.

def function(data, n):
    if n <= constant:
       return do_simple_case(data, n)
    if some_condition():
       function(data[:n/2], n / 2) # Recurse on first half of data
    else:
       function(data[n/2:], n - n / 2) # Recurse on second half of data

While O(n log n) algorithms look like the function below. They also split the data in half, but they need to consider both halves.

def function(data, n):
    if n <= constant:
       return do_simple_case(data, n)
    part1 = function(data[n/2:], n / 2)      # Recurse on first half of data
    part2 = function(data[:n/2], n - n / 2)  # Recurse on second half of data
    return combine(part1, part2)

Where do_simple_case() takes O(1) time and combine() takes no more than O(n) time.

The algorithms don't need to split the data exactly in half. They could split it into one-third and two-thirds, and that would be fine. For average-case performance, splitting it in half on average is sufficient (like QuickSort). As long as the recursion is done on pieces of (n/something) and (n - n/something), it's okay. If it's breaking it into (k) and (n-k) then the height of the tree will be O(n) and not O(log n).

心碎的声音 2024-09-04 14:08:31

您通常可以为算法声明 log n ,每次运行时它都会将空间/时间减半。一个很好的例子是任何二进制算法(例如,二进制搜索)。您选择向左或向右,然后将您正在搜索的空间切成两半。重复做一半的模式是log n。

You can usually claim log n for algorithms where it halves the space/time each time it runs. A good example of this is any binary algorithm (e.g., binary search). You pick either left or right, which then axes the space you're searching in half. The pattern of repeatedly doing half is log n.

你在我安 2024-09-04 14:08:31

对于某些算法,通过直觉获得运行时间的严格限制几乎是不可能的(我认为我永远无法凭直觉知道 O(n log log n) 运行时间,例如,我怀疑有人会期望你这样做)。如果您能掌握CLRS 算法简介文本,您将找到一种相当彻底的渐近符号处理方法,它既适当严格又不完全不透明。

如果算法是递归的,导出界限的一种简单方法是写出递归式,然后开始解决它,无论是迭代还是使用 主定理 或其他方式。例如,如果您不希望对此非常严格,那么获得 QuickSort 运行时间的最简单方法是通过主定理 - QuickSort 需要将数组划分为两个相对相等的子数组(应该相当直观地看到这是 O(n)),然后对这两个子数组递归调用 QuickSort。然后,如果我们让 T(n) 表示运行时间,我们有 T(n) = 2T(n/2) + O(n),由 Master方法是O(n log n)

For some algorithms, getting a tight bound for the running time through intuition is close to impossible (I don't think I'll ever be able to intuit a O(n log log n) running time, for instance, and I doubt anyone will ever expect you to). If you can get your hands on the CLRS Introduction to Algorithms text, you'll find a pretty thorough treatment of asymptotic notation which is appropriately rigorous without being completely opaque.

If the algorithm is recursive, one simple way to derive a bound is to write out a recurrence and then set out to solve it, either iteratively or using the Master Theorem or some other way. For instance, if you're not looking to be super rigorous about it, the easiest way to get QuickSort's running time is through the Master Theorem -- QuickSort entails partitioning the array into two relatively equal subarrays (it should be fairly intuitive to see that this is O(n)), and then calling QuickSort recursively on those two subarrays. Then if we let T(n) denote the running time, we have T(n) = 2T(n/2) + O(n), which by the Master Method is O(n log n).

っ〆星空下的拥抱 2024-09-04 14:08:31

查看此处给出的“电话簿”示例:什么是“Big”的简单英语解释O" 表示法?

请记住,Big-O 完全与规模有关:随着数据集的增长,该算法将需要多少操作?

O(log n) 通常意味着您可以在每次迭代时将数据集切成两半(例如二分搜索)

O(n log n) 意味着您正在执行 O( log n) 操作对于数据集中的每个

我很确定“O(n log log n)”没有任何意义。或者如果确实如此,它会简化为 O(n log n)。

Check out the "phone book" example given here: What is a plain English explanation of "Big O" notation?

Remember that Big-O is all about scale: how much more operation will this algorithm require as the data set grows?

O(log n) generally means you can cut the dataset in half with each iteration (e.g. binary search)

O(n log n) means you're performing an O(log n) operation for each item in your dataset

I'm pretty sure 'O(n log log n)' doesn't make any sense. Or if it does, it simplifies down to O(n log n).

北陌 2024-09-04 14:08:31

我将尝试对为什么归并排序是 n log n 进行直观分析,如果您能给我一个 n log log n 算法的示例,我也可以完成它。

合并排序是一个排序示例,它通过重复拆分元素列表直到仅存在元素,然后将这些列表合并在一起来工作。每个合并中的主要操作是比较,每个合并最多需要 n 次比较,其中 n 是两个列表组合的长度。由此您可以推导出递推式并轻松解决它,但我们将避免使用这种方法。

相反,考虑归并排序的行为方式,我们将获取一个列表并将其拆分,然后将其分成两半并再次拆分,直到我们有 n 个长度为 1 的分区。我希望很容易看出此递归将只深入 log(n) 直到我们将列表分成 n 个分区。

现在我们已经知道这 n 个分区中的每一个都需要合并,那么一旦这些分区被合并,下一个级别就需要合并,直到我们再次得到长度为 n 的列表。请参阅维基百科的图形,了解此过程的简单示例 http://en.wikipedia.org /wiki/文件:Merge_sort_algorithm_diagram.svg

现在考虑这个过程将花费的时间,我们将有 log (n) 个级别,并且在每个级别我们必须合并所有列表。事实证明,每个级别将需要 n 次合并,因为我们每次将总共合并 n 个元素。然后,您可以很容易地看到,如果您将比较操作视为最重要的操作,则使用归并排序对数组进行排序将花费 n log (n) 时间。

如果有任何不清楚的地方或者我跳过了某个地方,请告诉我,我可以尝试更详细。

编辑第二个解释:

让我想一下是否可以更好地解释这一点。

问题被分解为一堆较小的列表,然后对较小的列表进行排序和合并,直到返回到现在已排序的原始列表。

当你分解问题时,你首先有几个不同的大小级别,你将有两个大小列表:n/2,n/2,然后在下一个级别,你将有四个大小列表:n/4,n/ 4, n/4, n/4 在下一个级别你将有 n/8, n/8 ,n/8 ,n/8, n/8, n/8 ,n/8 ,n/8 这继续直到 n/2^k 等于 1(每个细分都是长度除以 2 的幂,并非所有长度都能被四整除,所以不会这么漂亮)。这是重复除以 2 的过程,并且最多可以继续 log_2(n) 次,因为 2^(log_2(n) )=n,因此任何更多的除以 2 都会产生大小为零的列表。

现在要注意的重要一点是,在每个级别我们都有 n 个元素,因此对于每个级别,合并将花费 n 时间,因为合并是线性操作。如果有 log(n) 级递归,那么我们将执行这个线性运算 log(n) 次,因此我们的运行时间将为 n log(n)。

抱歉,如果这也没有帮助。

I'll attempt to do an intuitive analysis of why Mergesort is n log n and if you can give me an example of an n log log n algorithm, I can work through it as well.

Mergesort is a sorting example that works through splitting a list of elements repeatedly until only elements exists and then merging these lists together. The primary operation in each of these merges is comparison and each merge requires at most n comparisons where n is the length of the two lists combined. From this you can derive the recurrence and easily solve it, but we'll avoid that method.

Instead consider how Mergesort is going to behave, we're going to take a list and split it, then take those halves and split it again, until we have n partitions of length 1. I hope that it's easy to see that this recursion will only go log (n) deep until we have split the list up into our n partitions.

Now that we have that each of these n partitions will need to be merged, then once those are merged the next level will need to be merged, until we have a list of length n again. Refer to wikipedia's graphic for a simple example of this process http://en.wikipedia.org/wiki/File:Merge_sort_algorithm_diagram.svg.

Now consider the amount of time that this process will take, we're going to have log (n) levels and at each level we will have to merge all of the lists. As it turns out each level will take n time to merge, because we'll be merging a total of n elements each time. Then you can fairly easily see that it will take n log (n) time to sort an array with mergesort if you take the comparison operation to be the most important operation.

If anything is unclear or I skipped somewhere please let me know and I can try to be more verbose.

Edit Second Explanation:

Let me think if I can explain this better.

The problem is broken into a bunch of smaller lists and then the smaller lists are sorted and merged until you return to the original list which is now sorted.

When you break up the problems you have several different levels of size first you'll have two lists of size: n/2, n/2 then at the next level you'll have four lists of size: n/4, n/4, n/4, n/4 at the next level you'll have n/8, n/8 ,n/8 ,n/8, n/8, n/8 ,n/8 ,n/8 this continues until n/2^k is equal to 1 (each subdivision is the length divided by a power of 2, not all lengths will be divisible by four so it won't be quite this pretty). This is repeated division by two and can continue at most log_2(n) times, because 2^(log_2(n) )=n, so any more division by 2 would yield a list of size zero.

Now the important thing to note is that at every level we have n elements so for each level the merge will take n time, because merge is a linear operation. If there are log(n) levels of the recursion then we will perform this linear operation log(n) times, therefore our running time will be n log(n).

Sorry if that isn't helpful either.

岁月苍老的讽刺 2024-09-04 14:08:31

当应用分而治之算法时,将问题划分为子问题,直到问题简单到微不足道为止,如果划分顺利,每个子问题的大小为 n/2 或左右。这通常是大 O 复杂度中出现的 log(n) 的起源:O(log(n)) 是分区时所需的递归调用次数进展顺利。

When applying a divide-and-conquer algorithm where you partition the problem into sub-problems until it is so simple that it is trivial, if the partitioning goes well, the size of each sub-problem is n/2 or thereabout. This is often the origin of the log(n) that crops up in big-O complexity: O(log(n)) is the number of recursive calls needed when partitioning goes well.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文