当前位置：文江博客话题详情

在最坏的情况下二分搜索是否是最优的？

发布于 2024-12-07 01:54:49 字数 211 浏览 6 评论 0原文

在最坏的情况下二分搜索是否是最优的？我的老师是这么说的，但我找不到支持它的书。我们从一个有序数组开始，在最坏的情况下（该算法的最坏情况），任何算法总是比二分搜索需要更多的成对比较。

很多人表示这个问题不清楚。对不起！所以输入是任何通用的排序数组。我正在寻找一个证明，表明任何搜索算法在最坏情况下都将至少进行 log2(N) 比较（考虑的算法的最坏情况）。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

浴红衣 2024-12-14 01:54:49

是的，二分搜索是最佳的。

通过诉诸信息论，这一点很容易看出。仅需要 log N 位才能从 N 元素中识别一个唯一元素。但每次比较只能提供一点信息。因此，您必须执行 log N 比较才能识别唯一元素。

更详细地说...考虑一个假设的算法 X，它在最坏的情况下优于二分搜索。对于数组的特定元素，运行算法并记录它提出的问题；即它执行的比较的顺序。或者更确切地说，记录这些问题的答案（例如“真，假，假，真”）。

将该序列转换为二进制字符串 (1,0,0,1)。将此二进制字符串称为“元素相对于算法 X 的签名”。对数组的每个元素执行此操作，为每个元素分配一个“签名”。

现在关键就在这里。如果两个元素具有相同的签名，那么算法 X 无法区分它们！算法对数组的所有了解都是它从提出的问题中得到的答案；即它执行的比较。如果算法无法区分两个元素，那么它就不可能是正确的。（换句话说，如果两个元素具有相同的签名，意味着它们会导致算法进行相同的比较序列，那么算法返回哪一个？矛盾。）

最后，证明如果每个签名都少于 log N 位，则必须存在两个具有相同签名的元素（鸽子洞原理）。完毕。

[更新]

一条简短的补充评论。上面假设算法除了从执行比较中学到的知识之外，对数组一无所知。当然，在现实生活中，有时您确实先验地了解了一些关于数组的信息。作为一个玩具示例，如果我知道数组有（比如说）10 个元素，全部在 1 到 100 之间，并且它们是不同的，并且数字 92 到 100 都存在于数组中......那么显然我不知道即使在最坏的情况下也需要进行四次比较。

更现实的是，如果我知道元素在最小值和最大值之间均匀分布（或大致均匀分布），那么我可以做得比二分搜索更好。

但一般情况下，二分查找仍然是最优的。

Yes, binary search is optimal.

This is easily seen by appealing to information theory. It takes log N bits merely to identify a unique element out of N elements. But each comparison only gives you one bit of information. Therefore, you must perform log N comparisons to identify a unique element.

More verbosely... Consider a hypothetical algorithm X that outperforms binary search in the worst case. For a particular element of the array, run the algorithm and record the questions it asks; i.e., the sequence of comparisons it performs. Or rather, record the answers to those questions (like "true, false, false, true").

Convert that sequence into a binary string (1,0,0,1). Call this binary string the "signature of the element with respect to algorithm X". Do this for each element of the array, assigning a "signature" to each element.

Now here is the key. If two elements have the same signature, then algorithm X cannot tell them apart! All the algorithm knows about the array are the answers it gets from the questions it asks; i.e., the comparisons it performs. And if the algorithm cannot tell two elements apart, then it cannot be correct. (Put another way, if two elements have the same signature, meaning they result in the same sequence of comparisons by the algorithm, which one did the algorithm return? Contradiction.)

Finally, prove that if every signature has fewer than log N bits, then there must exist two elements with the same signature (pigeonhole principle). Done.

[update]

One quick additional comment. The above assumes that the algorithm does not know anything about the array except what it learns from performing comparisons. Of course, in real life, sometimes you do know something about the array a priori. As a toy example, if I know that the array has (say) 10 elements all between 1 and 100, and that they are distinct, and that the numbers 92 through 100 are all present in the array... Then clearly I do not need to perform four comparisons even in the worst case.

More realistically, if I know that the elements are uniformly distributed (or roughly uniformly distributed) between their min and their max, again I can do better than binary search.

But in the general case, binary search is still optimal.

回复收藏 0 原文

那小子欠揍 2024-12-14 01:54:49

哪种算法的最坏情况？不存在一种普遍的“最坏情况”。如果您的问题是...

“是否存在二分搜索比其他算法需要更多比较的情况？”

那么，当然可以。如果元素恰好是列表中的第一个元素，则简单的线性搜索会花费更少的时间。

“是否有一种算法在最坏情况下运行时间比二分搜索更好？”

是的，如果您对数据了解更多的话。例如，基数树或特里树在条目数量方面最差是恒定时间的（但与键的长度成线性）。

“是否有一种通用搜索算法，其最坏情况运行时间比二分搜索更好？”

如果您只能假设您对键有一个比较函数，不，最好的最坏情况是 O(记录n）。但有些算法更快，只是不是在大 O 意义上。

...所以我想你真的必须首先定义这个问题！

回复收藏 0 原文

焚却相思 2024-12-14 01:54:49

二分搜索的最坏情况复杂度为 O(log(N)) 比较 - 这对于排序数组的基于比较的搜索来说是最佳的。

在某些情况下，做一些除了纯粹基于比较的搜索之外的事情可能是有意义的 - 在这种情况下，您可能能够克服 O(log(N)) 障碍 - 即查看插值搜索。

回复收藏 0 原文

断舍离 2024-12-14 01:54:49

这取决于数据的性质。例如英语和字典。您可以利用英语中某些字母以不同频率出现的事实，编写一种比二分搜索更好的算法。

但一般来说，二分搜索是一个安全的选择。

回复收藏 0 原文

小猫一只 2024-12-14 01:54:49

我觉得这个问题有点不清楚，但仍然是我的想法。

二分搜索的最坏情况是在所有 log n 比较之后找到您要搜索的元素。但相同的数据可能是线性搜索的最佳情况。这取决于数据排列和您要搜索的内容，但二分搜索的最坏情况最终将是 log n。现在，这不能与相同的数据和线性搜索搜索进行比较，因为最坏的情况会有所不同。线性搜索最坏的情况可能是找到恰好位于数组末尾的元素。

例如：数组 A = 1, 2, 3, 4, 5, 6，在 A 上二分查找 1 将是最坏的情况。而对于同一个数组，线性搜索 6 是最坏的情况，而不是搜索 1。

回复收藏 0 原文

~没有更多了~