扩展二分搜索算法以查找数组中要搜索的键值的第一个和最后一个索引
问题是扩展二分搜索算法,以最有效的方式查找排序数组中目标值的所有出现位置。 具体来说,算法的输入是(1)一个已排序的整数数组,其中某些数字可能出现多次,以及(2)要搜索的目标整数。该算法的输出应该是一对索引值,指示该整数在数组中的第一次和最后一次出现(如果确实出现)。 源代码可以是 c#、c、c++ 语言。
另外,我们可能需要查找索引的最大和最小比较次数是多少?
The problem is to extend the binary search algorithm to find all occurrences of a target value in a sorted array in the most efficient way.
Concretely speaking, the input of the algorithm is (1) a sorted array of integers, where some numbers may appear more than once, and (2) a target integer to be searched. The output of the algorithm should be a pair of index values, indicating the first and last occurrence of the integer in the array, if it does occur.
The source code could be in c#, c, c++.
Also, what is the max and min number of comparisons that we might need to find the indexes?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
对于 C++,您可以查找 std::equal_range() 及其复杂性要求。只要您对基本算法感兴趣,无论使用何种语言实现,都应该适用相同的一般规则。
For C++, you could look up
std::equal_range()
and its complexity requirements. As long as you're interested in the basic algorithm, the same general rules should apply regardless of the language use for the implementation.如果你聪明一点,你可以定义两个不同的二分搜索函数。一个将返回搜索值第一次出现的索引,另一个将返回搜索值最后一次出现的索引。根据您对二分搜索的了解,您应该能够确定最大和最小比较次数。
在我看来,使用两次二分搜索应该是平均最快的方法。例如,如果您仅使用一次二分搜索来查找第一项,然后线性搜索,最坏的情况是整个函数具有相同的值。对于长度为 10000 的数组,这将在最坏的情况下给出 10013 次比较,而对于同一数组,使用两次二分搜索将在最坏的情况下给出 28 次比较。当然,使用相同大小的数组,二分/线性搜索方法的最佳情况是 14 次比较,而两次二分搜索方法的最佳情况是 26 次比较。
** 更新
好的,这是一个二分搜索,用于查找数组中元素的第一次出现。我会给你一个递归函数(你当然可以让它迭代并以其他方式优化它)。这会在 int 数组 a 中搜索 int val。另外,我没有仔细寻找中点(如果数组真的很大,可能会出现问题)。
但是,您应该在返回索引后检查它是否实际上引用了正确的值,因为如果 val 不在数组中,则返回的索引将对应于大于 val 的下一个元素。
对此进行一些小的更改将生成一个查找最后一个元素的函数。做到这一点的关键是正确使用比较器并记住整数除法总是截断。
If you are a little clever you can define two different binary search functions. One will return the index of the first appearance of the searched for value and the other will return the last appearance of the searched for value. From your knowledge of binary search, you should be able to determine the maximum and minimum number of comparisons.
Using two binary searches should be the fastest method on average in my opinion. For instance, if you use just one binary search to find the first item and search linearly afterwards the worst case would be if the entire function is the same value. For an array of length 10000, this would give 10013 comparisons in the worst case while using two binary searches would give 28 comparisons in the worst case for the same array. Of course, using the same size of array, the best case for the binary/linear search method would be 14 comparisons while the best case for two binary searches method is 26 comparisons.
** Update
Okay, here is a binary search to find the first appearance of an element in an array. I'll give you a recursive function (you can of course make it iterative and optimize this in other ways). This searches for the int val in the array a of ints. Also, I haven't been careful about finding the midpoint (if the array is really large there could be problems).
However, you should check after you are returned an index that it actually refers to the correct value because if val is not in the array, the returned index will to correspond to the next element larger than val.
A few minor changes to this will make a function that finds the last element. The keys to doing this are using the comparators correctly and remembering that integer division always truncates.
通过重复调用标准算法,无需编写自己的二分搜索算法,这相当容易做到。
这与使用自定义算法获得的效率非常接近,只是函数调用开销更多。
至于比较的次数,我必须更努力地思考才能确定,但我认为它只是 2*log2N,其中 N 是列表中的项目数。
编辑
呸!它不是 2*log2N,因为与使用自定义算法所做的不同,它不会逐渐排除列表的部分内容。看来1最大比较计数为 (log2N - 0.5) * log2N。对于具有 230 元素的列表,这仍然只有 885 次比较(220 N 为 390 次比较,210 N 为 95 次比较),但我们可以做得更好。
这最多会进行 2*log2N 次比较。 230 项最多需要 60 次比较,220 项最多需要 40 次比较,依此类推。
1 我确定了这一点实验性地。我不够聪明,无法用数学方法计算出来。
This is fairly easy to do without writing your own binary search algorithm, by repeatedly calling a standard algorithm.
This is pretty close to the same efficiency you'd get with a custom algorithm, except that you have more function call overhead.
As for the number of comparisons, I'd have to think a little harder to be sure, but I think it's just 2*log2N, where N is the number of items in the list.
Edit
Bah! It's not 2*log2N, because unlike what you could do with a custom algorithm, it doesn't incrementally exclude portions of the list. It appears1 that the maximum comparison count is (log2N - 0.5) * log2N. This is still only 885 comparisons for a list with 230 elements (390 comparisons for 220 N, and 95 for 210 N), but we can do better than that.
This will do at most 2*log2N comparisons. 230 items will require at most 60 comparisons, 220 items will require at most 40 comparisons, etc.
1 I determined this experimentally. I'm not quite smart enough to figure it out mathematically.
您可以在 Bentley Programming Pearls 和 Knuth 的 Vol.3:排序和搜索中找到对此的讨论。
这是 C++ 中的一个实现: http://the-algo-blog.blogspot.com/2011/06/binary-search-to-find-last-and-first.html
You can find the discussion on this in Bentley Programming Pearls and Knuth's Vol.3 : Sorting and Searching.
Here is one implementation in C++ : http://the-algo-blog.blogspot.com/2011/06/binary-search-to-find-last-and-first.html
对于问题中最有效的部分没有明确的答案。这取决于预期有多少个具有相同值的条目。如果是一些,则在找到一个元素后在数组的两个方向上进行线性搜索将是最快的选择,但如果您期望大量具有相同值的条目,您可以进行一种二分搜索来查找开始结束索引。
免责声明:未经测试;它的目的是展示这个想法,而不是直接用作生产代码
,然后反过来用于上限。然而,在比简单的线性搜索更快之前,它需要相当多的元素。
There's no clean answer to the most efficient part of the question. That would depend on how many entries with the same value is to be expected. If it's a few the a linear search in both directtions of the array after finding one element will be you're fastest option but if you're expecting a lot of entries with the same value you could do kind of a binary search to find the start end indices.
Disclaimer: Not tested; it's meant to show the idea and not to be used directly as production code
and then the reverse for the upper bound. However it will require quite a lot of elements before this is faster than a simple linear search.
我想正常的算法会有这样的内容:
一旦您使用它来查找其中一个值,请使用当前必须找到提示的最小值和最大值执行两次稍微修改的二分搜索。
要找到最上面的,请将上面的内容替换为:
对于最下面的,替换为:
请注意,使用此方法不会提前返回,您只需继续下去,直到最小值和最大值就像一个或其他东西一样,我想您可以将一个与另一个相加检查
等
I imagine that the normal algorithm would have something like this in it:
Once you have used this to find one of the values, perform two more slightly moded binary searches using the min and max you currently have to find the tips.
To find the top most replace the above with:
for the bottom most replace with:
Note there is no early return using this method, you just keep going until min and max are like one or something apart, I suppose you could add one with another check
etc.
我创建了两种二分搜索方法,分别返回第一个和最后一个出现的位置。
I have created two binary search methods for returning first and last occurrences respectively.