扩展二分搜索算法以查找数组中要搜索的键值的第一个和最后一个索引

发布于 2024-08-20 18:33:34 字数 199 浏览 9 评论 0原文

问题是扩展二分搜索算法，以最有效的方式查找排序数组中目标值的所有出现位置。具体来说，算法的输入是（1）一个已排序的整数数组，其中某些数字可能出现多次，以及（2）要搜索的目标整数。该算法的输出应该是一对索引值，指示该整数在数组中的第一次和最后一次出现（如果确实出现）。源代码可以是 c#、c、c++ 语言。

另外，我们可能需要查找索引的最大和最小比较次数是多少？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

欢烬 2024-08-27 18:33:34

对于 C++，您可以查找 std::equal_range() 及其复杂性要求。只要您对基本算法感兴趣，无论使用何种语言实现，都应该适用相同的一般规则。

回复收藏 0 原文

冷情 2024-08-27 18:33:34

如果你聪明一点，你可以定义两个不同的二分搜索函数。一个将返回搜索值第一次出现的索引，另一个将返回搜索值最后一次出现的索引。根据您对二分搜索的了解，您应该能够确定最大和最小比较次数。

在我看来，使用两次二分搜索应该是平均最快的方法。例如，如果您仅使用一次二分搜索来查找第一项，然后线性搜索，最坏的情况是整个函数具有相同的值。对于长度为 10000 的数组，这将在最坏的情况下给出 10013 次比较，而对于同一数组，使用两次二分搜索将在最坏的情况下给出 28 次比较。当然，使用相同大小的数组，二分/线性搜索方法的最佳情况是 14 次比较，而两次二分搜索方法的最佳情况是 26 次比较。

** 更新

好的，这是一个二分搜索，用于查找数组中元素的第一次出现。我会给你一个递归函数（你当然可以让它迭代并以其他方式优化它）。这会在 int 数组 a 中搜索 int val。另外，我没有仔细寻找中点（如果数组真的很大，可能会出现问题）。

int bs1(int a[], int val, int left, int right)
{
    if(right == left) return left;
    int mid = (right+left)/2;

    if(val > a[mid]) return bs1(a, val, mid+1, right);
    else return bs1(a, val, left, mid);
}

但是，您应该在返回索引后检查它是否实际上引用了正确的值，因为如果 val 不在数组中，则返回的索引将对应于大于 val 的下一个元素。

对此进行一些小的更改将生成一个查找最后一个元素的函数。做到这一点的关键是正确使用比较器并记住整数除法总是截断。

If you are a little clever you can define two different binary search functions. One will return the index of the first appearance of the searched for value and the other will return the last appearance of the searched for value. From your knowledge of binary search, you should be able to determine the maximum and minimum number of comparisons.

Using two binary searches should be the fastest method on average in my opinion. For instance, if you use just one binary search to find the first item and search linearly afterwards the worst case would be if the entire function is the same value. For an array of length 10000, this would give 10013 comparisons in the worst case while using two binary searches would give 28 comparisons in the worst case for the same array. Of course, using the same size of array, the best case for the binary/linear search method would be 14 comparisons while the best case for two binary searches method is 26 comparisons.

** Update

Okay, here is a binary search to find the first appearance of an element in an array. I'll give you a recursive function (you can of course make it iterative and optimize this in other ways). This searches for the int val in the array a of ints. Also, I haven't been careful about finding the midpoint (if the array is really large there could be problems).

int bs1(int a[], int val, int left, int right)
{
    if(right == left) return left;
    int mid = (right+left)/2;

    if(val > a[mid]) return bs1(a, val, mid+1, right);
    else return bs1(a, val, left, mid);
}

However, you should check after you are returned an index that it actually refers to the correct value because if val is not in the array, the returned index will to correspond to the next element larger than val.

A few minor changes to this will make a function that finds the last element. The keys to doing this are using the comparators correctly and remembering that integer division always truncates.

回复收藏 0 原文

诗酒趁年少 2024-08-27 18:33:34

通过重复调用标准算法，无需编写自己的二分搜索算法，这相当容易做到。

// some curly-bracket language:

// int BinarySearch(sortedList, searchIndex, searchLength, valueToFind)
// returns the zero-based index of the item in the list, or a negative value
// if the item is not found

int inner = BinarySearch(list, 0, listSize, value);
if(inner < 0){
    // handle case where value is not found in list
}

int bottom = inner, top = inner;
while(true){
    int i = BinarySearch(list, 0, bottom, value);
    if(i < 0)
        break;
    bottom = i;
}
while(true){
    int i = BinarySearch(list, top + 1, listSize - top - 1, value);
    if(i < 0)
        break;
    top = i;
}

// bottom and top now hold the bounds of all instances of value in list

这与使用自定义算法获得的效率非常接近，只是函数调用开销更多。

至于比较的次数，我必须更努力地思考才能确定，但我认为它只是 2*log₂N，其中 N 是列表中的项目数。

编辑

呸！它不是 2*log₂N，因为与使用自定义算法所做的不同，它不会逐渐排除列表的部分内容。看来¹最大比较计数为 (log₂N - 0.5) * log₂N。对于具有 2³⁰ 元素的列表，这仍然只有 885 次比较（2²⁰ N 为 390 次比较，2¹⁰ N 为 95 次比较），但我们可以做得更好。

// int Compare(a, b)
// returns 0 if a and b are equal,
//         a negative value if a < b, or
//         a positive value if a > b

int start = 0, end = listSize, inner;

while(true){
    if(end == start){
        // handle case where value is not found in list
    }
    inner = (start + end) / 2;
    int cmp = Compare(list[inner], value);
    if(cmp == 0)
        break;
    if(cmp < 0)
        start = inner + 1;
    else end = inner;
}

int top = inner, bottom = inner;

while(true){
    if(start >= bottom)
        break;
    inner = (start + bottom) / 2;
    int cmp = Compare(list[inner], value);
    if(cmp == 0)
        bottom = inner;
    else start = inner + 1;
}

while(true){
    if(end - 1 <= top)
        break;
    inner = (top + 1 + end) / 2;
    int cmp = Compare(list[inner], value);
    if(cmp == 0)
        top = inner;
    else end = inner;
}

这最多会进行 2*log₂N 次比较。 2³⁰ 项最多需要 60 次比较，2²⁰ 项最多需要 40 次比较，依此类推。

_{¹ 我确定了这一点实验性地。我不够聪明，无法用数学方法计算出来。}

This is fairly easy to do without writing your own binary search algorithm, by repeatedly calling a standard algorithm.

// some curly-bracket language:

// int BinarySearch(sortedList, searchIndex, searchLength, valueToFind)
// returns the zero-based index of the item in the list, or a negative value
// if the item is not found

int inner = BinarySearch(list, 0, listSize, value);
if(inner < 0){
    // handle case where value is not found in list
}

int bottom = inner, top = inner;
while(true){
    int i = BinarySearch(list, 0, bottom, value);
    if(i < 0)
        break;
    bottom = i;
}
while(true){
    int i = BinarySearch(list, top + 1, listSize - top - 1, value);
    if(i < 0)
        break;
    top = i;
}

// bottom and top now hold the bounds of all instances of value in list

This is pretty close to the same efficiency you'd get with a custom algorithm, except that you have more function call overhead.

As for the number of comparisons, I'd have to think a little harder to be sure, but I think it's just 2*log₂N, where N is the number of items in the list.

Edit

Bah! It's not 2*log₂N, because unlike what you could do with a custom algorithm, it doesn't incrementally exclude portions of the list. It appears¹ that the maximum comparison count is (log₂N - 0.5) * log₂N. This is still only 885 comparisons for a list with 2³⁰ elements (390 comparisons for 2²⁰ N, and 95 for 2¹⁰ N), but we can do better than that.

// int Compare(a, b)
// returns 0 if a and b are equal,
//         a negative value if a < b, or
//         a positive value if a > b

int start = 0, end = listSize, inner;

while(true){
    if(end == start){
        // handle case where value is not found in list
    }
    inner = (start + end) / 2;
    int cmp = Compare(list[inner], value);
    if(cmp == 0)
        break;
    if(cmp < 0)
        start = inner + 1;
    else end = inner;
}

int top = inner, bottom = inner;

while(true){
    if(start >= bottom)
        break;
    inner = (start + bottom) / 2;
    int cmp = Compare(list[inner], value);
    if(cmp == 0)
        bottom = inner;
    else start = inner + 1;
}

while(true){
    if(end - 1 <= top)
        break;
    inner = (top + 1 + end) / 2;
    int cmp = Compare(list[inner], value);
    if(cmp == 0)
        top = inner;
    else end = inner;
}

This will do at most 2*log₂N comparisons. 2³⁰ items will require at most 60 comparisons, 2²⁰ items will require at most 40 comparisons, etc.

_{¹ I determined this experimentally. I'm not quite smart enough to figure it out mathematically.}

回复收藏 0 原文

童话 2024-08-27 18:33:34

您可以在 Bentley Programming Pearls 和 Knuth 的 Vol.3：排序和搜索中找到对此的讨论。

这是 C++ 中的一个实现： http://the-algo-blog.blogspot.com/2011/06/binary-search-to-find-last-and-first.html

回复收藏 0 原文

稳稳的幸福 2024-08-27 18:33:34

对于问题中最有效的部分没有明确的答案。这取决于预期有多少个具有相同值的条目。如果是一些，则在找到一个元素后在数组的两个方向上进行线性搜索将是最快的选择，但如果您期望大量具有相同值的条目，您可以进行一种二分搜索来查找开始结束索引。

免责声明：未经测试；它的目的是展示这个想法，而不是直接用作生产代码

int org = binarySearch(array,value) //do the binary search and find on element
int min = org-delta; //delta is some constant based on how many elemts are to be expected
int max = org;
min = min < 0 ? 0 : min;
int search= min;
bool latestWasHit = false;
while(search > 0)
{
  if(search+1 == max)
     return max;
  if(array[search] != value)
  {
     min = search;
     search = search + (max-search)/2
  }
  else
  {
     max = search;
     search = (search-min)/2;
  } 
}

，然后反过来用于上限。然而，在比简单的线性搜索更快之前，它需要相当多的元素。

There's no clean answer to the most efficient part of the question. That would depend on how many entries with the same value is to be expected. If it's a few the a linear search in both directtions of the array after finding one element will be you're fastest option but if you're expecting a lot of entries with the same value you could do kind of a binary search to find the start end indices.

Disclaimer: Not tested; it's meant to show the idea and not to be used directly as production code

int org = binarySearch(array,value) //do the binary search and find on element
int min = org-delta; //delta is some constant based on how many elemts are to be expected
int max = org;
min = min < 0 ? 0 : min;
int search= min;
bool latestWasHit = false;
while(search > 0)
{
  if(search+1 == max)
     return max;
  if(array[search] != value)
  {
     min = search;
     search = search + (max-search)/2
  }
  else
  {
     max = search;
     search = (search-min)/2;
  } 
}

and then the reverse for the upper bound. However it will require quite a lot of elements before this is faster than a simple linear search.

回复收藏 0 原文

七颜 2024-08-27 18:33:34

我想正常的算法会有这样的内容：

if(value == test) return;
if(value < test) min = i;
if(value > test) max = i;

一旦您使用它来查找其中一个值，请使用当前必须找到提示的最小值和最大值执行两次稍微修改的二分搜索。

要找到最上面的，请将上面的内容替换为：

if(value <= test) min = i;
if(value > test) max = i;

对于最下面的，替换为：

if(value >= test) max = i;
if(value < test) min = i;

请注意，使用此方法不会提前返回，您只需继续下去，直到最小值和最大值就像一个或其他东西一样，我想您可以将一个与另一个相加检查

if(value == test and arr[i-1] != test) return;

等

I imagine that the normal algorithm would have something like this in it:

if(value == test) return;
if(value < test) min = i;
if(value > test) max = i;

Once you have used this to find one of the values, perform two more slightly moded binary searches using the min and max you currently have to find the tips.

To find the top most replace the above with:

if(value <= test) min = i;
if(value > test) max = i;

for the bottom most replace with:

if(value >= test) max = i;
if(value < test) min = i;

Note there is no early return using this method, you just keep going until min and max are like one or something apart, I suppose you could add one with another check

if(value == test and arr[i-1] != test) return;

etc.

回复收藏 0 原文

梦回梦里 2024-08-27 18:33:34

我创建了两种二分搜索方法，分别返回第一个和最后一个出现的位置。

public static void main(String[] args) {
    int a[] ={1,2,2,2,2,2,5,5,6,8,9,10};

    System.out.println(5+" first = "+first(a, 5, 0, a.length-1));
    System.out.println(5+" last = "+right(a, 5, 0, a.length-1));

    System.out.println(1+" first = "+first(a, 1, 0, a.length-1));
    System.out.println(1+" last = "+right(a, 1, 0, a.length-1));

    System.out.println(2+" first = "+first(a, 2, 0, a.length-1));
    System.out.println(2+" last = "+right(a, 2, 0, a.length-1));

    System.out.println(10+" first = "+first(a, 10, 0, a.length-1));
    System.out.println(10+" last = "+right(a, 10, 0, a.length-1));

    System.out.println(8+" first = "+first(a, 8, 0, a.length-1));
    System.out.println(8+" last = "+right(a, 8, 0, a.length-1));

    System.out.println(11+" first = "+first(a, 11, 0, a.length-1));
    System.out.println(11+" last = "+right(a, 11, 0, a.length-1));


}

private static int first(int [] a, int x, int l, int h){
    if(l>h){
        return -1;
    }
    int mid = (h-l)/2+l;
    if(a[mid] == x && (mid==0 || a[mid-1] != x) ){
        return mid;
    }else if(a[mid] == x){
        return first(a, x, l, mid-1);
    }else if(a[mid]>x){
        return first(a, x, l, mid-1);
    }else{
        return first(a, x, mid+1, h);
    }
}


private static int right(int [] a, int x, int l, int h){
    if(l>h){
        return -1;
    }
    int mid = (h-l)/2+l;
    if(a[mid] == x && (mid==a.length-1 || a[mid+1] != x) ){
        return mid;
    }else if(a[mid] == x){
        return right(a, x, mid+1, h);
    }else if(a[mid]>x){
        return right(a, x, l, mid-1);
    }else{
        return right(a, x, mid+1, h);
    }
}

Output:
    1 first = 0
    1 last = 0
    2 first = 1
    2 last = 5
    10 first = 11
    10 last = 11
    8 first = 9
    8 last = 9
    11 first = -1
    11 last = -1

I have created two binary search methods for returning first and last occurrences respectively.

public static void main(String[] args) {
    int a[] ={1,2,2,2,2,2,5,5,6,8,9,10};

    System.out.println(5+" first = "+first(a, 5, 0, a.length-1));
    System.out.println(5+" last = "+right(a, 5, 0, a.length-1));

    System.out.println(1+" first = "+first(a, 1, 0, a.length-1));
    System.out.println(1+" last = "+right(a, 1, 0, a.length-1));

    System.out.println(2+" first = "+first(a, 2, 0, a.length-1));
    System.out.println(2+" last = "+right(a, 2, 0, a.length-1));

    System.out.println(10+" first = "+first(a, 10, 0, a.length-1));
    System.out.println(10+" last = "+right(a, 10, 0, a.length-1));

    System.out.println(8+" first = "+first(a, 8, 0, a.length-1));
    System.out.println(8+" last = "+right(a, 8, 0, a.length-1));

    System.out.println(11+" first = "+first(a, 11, 0, a.length-1));
    System.out.println(11+" last = "+right(a, 11, 0, a.length-1));


}

private static int first(int [] a, int x, int l, int h){
    if(l>h){
        return -1;
    }
    int mid = (h-l)/2+l;
    if(a[mid] == x && (mid==0 || a[mid-1] != x) ){
        return mid;
    }else if(a[mid] == x){
        return first(a, x, l, mid-1);
    }else if(a[mid]>x){
        return first(a, x, l, mid-1);
    }else{
        return first(a, x, mid+1, h);
    }
}


private static int right(int [] a, int x, int l, int h){
    if(l>h){
        return -1;
    }
    int mid = (h-l)/2+l;
    if(a[mid] == x && (mid==a.length-1 || a[mid+1] != x) ){
        return mid;
    }else if(a[mid] == x){
        return right(a, x, mid+1, h);
    }else if(a[mid]>x){
        return right(a, x, l, mid-1);
    }else{
        return right(a, x, mid+1, h);
    }
}

Output:
    1 first = 0
    1 last = 0
    2 first = 1
    2 last = 5
    10 first = 11
    10 last = 11
    8 first = 9
    8 last = 9
    11 first = -1
    11 last = -1

回复收藏 0 原文

~没有更多了~