预期运行时间与最坏情况运行时间

发布于 2024-12-12 02:18:13 字数 92 浏览 3 评论 0原文

我正在研究随机快速排序算法。我意识到该算法的运行时间始终表示为“预期运行时间”。

指定或使用“预期运行时间”的原因是什么？为什么我们不计算最坏或平均情况？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

吾家有女初长成 2024-12-19 02:18:13

有时，预期运行时间意味着随机选择的输入的平均运行时间。但如果它是随机算法，那么通常意味着算法对每个输入做出的随机选择的预期运行时间。这就是这里的意思。对于 n 个项目的每输入，随机快速排序的平均运行时间为 O(n(log n))，仅在抛硬币过程中取平均值。

从这个有限的意义上来说，预期运行时间是随机算法运行时间的一个非常可靠的衡量标准。如果您只担心算法内部翻转硬币时可能发生的最坏情况，那么为什么还要费心翻转硬币呢？你不妨让他们全都当头。（在随机快速排序的情况下，会将其简化为普通快速排序。）

当它是输入的平均值而不是硬币翻转的平均值时，平均情况与最坏情况是一个更严重的问题。在这种情况下，平均运行时间充其量只是一个不能适应输入类型变化的数字——算法的不同用途可能有不同的输入分布。我说最多是因为您可能不知道输入的假设分布始终是真实的用法。

仅当你们都希望在抛硬币不走运时跑得快，并且即使抛硬币不走运时也不想跑得太慢时，考虑抛硬币的最坏情况才有意义。例如，假设您需要一种供氧调节器（用于病人或水肺潜水员）的排序算法。那么，只有当你们都希望结果非常快、为了用户的方便，并且无论如何最坏的情况都不会令用户窒息时，随机快速排序才有意义。这是排序算法的人为场景，因为存在非随机排序算法（例如合并排序），其平均速度并不比快速排序慢多少。对于素性测试这样的问题，它的设计较少，其中随机算法比非随机算法快得多。然后，您可能想使用随机算法运行它，同时运行非随机算法作为备份。

（好吧，您可能想知道为什么氧气调节器想要知道特定数字是否是素数。也许它需要与医疗计算机网络通信，并且出于医疗隐私原因，通信需要安全......）

Sometimes the expected running time means the average running time for a randomly chosen input. But if it's a randomized algorithm, then what is often meant is the expected running time with respect to the random choices made by the algorithm, for every input. That is what is meant here. For every input of n items, randomized quicksort runs in time O(n(log n)) on average, averaged only over its coin flips.

In this limited sense, expected running time is a very defensible measure of the running time of a randomized algorithm. If you're only worried about the worst that could happen when the algorithm flips coins internally, then why bother flipping coins at all? You might as well just make them all heads. (Which in the case of randomized quicksort, would reduce it to ordinary quicksort.)

Average case vs worst case is a much more serious matter when it's an average over inputs rather than an average over coin flips. In this case, the average running time is at best a figure that is not adaptable to changes in the type of input --- different uses of the algorithm could have different distributions of inputs. I say at best because you may not know that the hypothetical distribution of inputs is ever the true usage.

Looking at the worst case with respect to coin flips only makes sense when you would both like to run quickly when your coin flips aren't unlucky, and not run too slowly even when your coin flips are unlucky. For instance, imagine that you need a sorting algorithm for the regulator for an oxygen supply (for a medical patient or a scuba diver). Then randomized quicksort would only make sense if you both want the result to be very fast usually, for the user's convenience, AND if the worst case wouldn't suffocate the user no matter what. This is a contrived scenario for sorting algorithms, because there are non-random sorting algorithms (like merge sort) that are not much slower than quicksort even is on average. It is less contrived for a problem like primality testing, where randomized algorithms are much faster than non-randomized algorithms. Then you might want to make a run for it with a randomized algorithm --- while at the same time running a non-randomized algorithm as a backup.

(Okay, you might wonder why an oxygen regulator would want to know whether particular numbers are prime. Maybe it needs to communicate with a medical computer network, and the communication needs to be secure for medical privacy reasons...)

回复收藏 0 原文

维持三分热 2024-12-19 02:18:13

当我们说“预期运行时间”时，我们谈论的是平均情况下的运行时间。我们可能正在讨论渐进上限或下限（或两者）。类似地，我们可以讨论最好或最坏情况下运行时间的渐近上限和下限。换句话说，界限与情况正交。

在随机快速排序的情况下，人们谈论预期的运行时间 (O(n log n))，因为这使得该算法看起来比最坏情况的 O(n^2) 算法更好（事实确实如此，尽管在最坏的情况）。换句话说，对于几乎所有输入，随机快速排序比冒泡排序渐近快得多，并且人们希望有一种方法可以清楚地表明这一事实；因此人们强调随机快速排序的平均情况运行时间，而不是它在最坏情况下像冒泡排序一样渐近糟糕的事实。

正如评论和 Greg 的出色回答中所指出的，通常会谈论相对于算法在固定的任意输入上执行期间做出的随机选择序列集的预期运行时间。这可能更自然，因为我们认为输入是由主动算法被动地作用的。事实上，它相当于随机输入的平均值以及其执行不考虑结构差异的算法。这两种公式都比输入对和随机选择集的真实平均值更容易可视化，但无论您采用哪种方式，您都会得到相同的答案。

回复收藏 0 原文

￠好甜 2024-12-19 02:18:13

如果算法的行为不仅由其输入决定，而且还由随机数生成器生成的值决定，则该算法是随机的。
这就是为什么你要分析预期的结果。

最坏情况分析仅针对输入。

回复收藏 0 原文

千年*琉璃梦 2024-12-19 02:18:13

有点晚了，而且评论很长，但我认为补充这一点很重要。任何期望时间为 T 的算法都可以成为最坏情况 O(T) 算法，马尔可夫不等式告诉我们，如果期望时间为 T，那么算法至少有 1/2 的概率将花费少于 2T 的操作，所以我们可以运行算法，如果需要超过 2T 的操作，我们会停止并重新运行它，最多执行 log(1/delta) 次将使我们的失败概率最多为 delta。因此，我们得到时间复杂度 O(T*log(1/delta)) 和故障概率 delta。但由于 log 太小，出于所有实际原因，这是一个失败概率为 0 的 O(T) 算法。例如，如果我们选择 delta 作为从可观测宇宙中随机选择的 2 个原子是同一个原子的概率，我们会得到 log (1/delta)=260 所以我们可以说我们得到了 O(T)，失败概率为 0。

回复收藏 0 原文

~没有更多了~