为什么插值搜索中每次比较后列表长度都会减少到 sqrt(n)?
根据我正在阅读的书,插值搜索在平均情况下需要O(loglogn)
。
本书假设每次比较都会将列表的长度从 n
减少到 sqrt(n)
。嗯,根据这个假设,计算出 O(loglogn)
并不困难。
然而,书中并没有更多地谈论这个假设,只是说这是正确的。
问题:谁能解释一下为什么这是真的?
According to the book I'm reading, interpolation search takes O(loglogn)
in average case.
The book assumes that each compare reduce the length of the list from n
to sqrt(n)
. Well, it isn't difficult to work out the O(loglogn)
given this assumption.
However, the book didn't talk more about this assumption except that it says this is correct.
Question: can anyone give some explanation on why this is true?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
想象一个排序数组,其中每个条目都是从一到一百万的数字。您想要查看数组中是否有 10000。由于 10000 不到 99% 的数字小于 100 万,因此如果数组的数字分布良好,则 10000 的条目(如果在数组中)很可能非常接近开头。如果我们查看数组中 1% 的条目,并发现它大于 10000,那么我们一步就消除了数组中 99% 的内容。这比二分搜索要好得多,二分搜索只查看区间的中间,因此一次最多只能消除一半的搜索空间。直观上,这就是为什么插值搜索在某些情况下比二分搜索快得多的原因。
要了解为什么预计时间复杂度为 O(log log n) 的严格分析,您必须通读有关该算法的教科书或论文。
Imagine a sorted array where each entry is a number from one to a million. You want to look to see if 10000 is in the array. As 10000 is less than 99% of the numbers less than one million, if the array has a nice distribution of numbers, chances are that an entry of 10000, if it is in the array, is very near the start. If we look at an entry 1% percent of the way through the array, and find that it is greater than 10000, we have eliminated 99% of the array in a single step. This is much better than a binary search, which only looks at the middle of an interval, and therefore can only eliminate at most half of the search space at a time. This is intuitively why interpolation search in some cases can be much faster than binary search.
To see the rigorous analysis of why it is expected to be O(log log n) you would have to read through a textbook or paper on the algorithm.
它取决于均匀分布的输入(如果没有这样的假设,O(log n) 是理论上可以做到的最好的,即二分搜索是最优的)。对于均匀分布,方差约为 sqrt(n),并且在预期情况下,每次迭代都在目标方差内。因此,正如您所说,搜索空间从 n -> n 开始。每次迭代的 sqrt(n) 。
It depends on the input being uniformly distributed (without such an assumption, O(log n) is the best you can do theoretically, ie binary search is optimal). With a uniform distribution, the variance is around sqrt(n), and in the expected case each iteration hits within the variance of the target. Thus, as you say, the search space goes from n -> sqrt(n) on each iteration.