寻找用 C 实现的快速排序整数数组交集/并集算法

发布于 2024-12-27 15:08:47 字数 92 浏览 3 评论 0原文

我正在寻找实现快速排序整数数组交集/并集操作的 C 算法(或代码)。越快越好。

换句话说,在 C 中实现两个整数数组之间的并集和交集运算的有效方法是什么?

I am looking for C algorithms (or code) that implement fast sorted integer array intersection/union operations. The faster, the better.

In other words, what's an efficient way in C to implement union and intersection operations between two arrays of integers?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

々眼睛长脚气 2025-01-03 15:08:47

这并不是最快的,但它展示了正确的算法。

//List is an output variable, it could be a 
//linked list or an array list or really anything you can add at end to.  

void intersection(int A[], int B[], int aLen, int bLen, List output)
{
    int i = 0;
    int j = 0;
    while (i < aLen && j < bLen)
    {
        a = A[i];
        b = B[j];
        if (a == b)
        {
            add(output, a);
            i++;
            j++;
        }
        else if (a < b)
        {
            i++;
        }
        else
        {
            j++;
        }
    }
}

上面的算法是 O(aLen + bLen)

你可以做得更好,特别是当涉及到 2 个以上列表相交的问题时。

对于交集,基本算法是迭代所有要同时相交的排序列表。如果所有列表的头都匹配,则移动到所有列表中的下一个元素并将头添加到交集。如果没有,则查找最大可见元素,并尝试在所有其他列表中查找该元素。

在我的代码示例中,我只是继续迭代,但由于这些是排序列表,如果您希望 A 是数字 1 到 10000,B 是集合 {7777},您也可以通过二分搜索找到正确的索引。如果你想正确地找到具有多个列表的最大元素意味着使用堆。

如果您更改二分搜索,最坏的情况将上升到 O((aLen + bLen) * (lg(aLen + bLen)),但根据数据,您的平均情况可能会大大改善。

堆更改将是必要的当将多个集合相交时,由于上述算法变为 O(numLists * (所有列表中的元素总数)) 并且可以减少到 O(lg (numLists) * (所有列表中的元素总数)) 并

void union(int A[], int B[], int aLen, int bLen, List output)
    {
        int i = 0;
        int j = 0;
        while (i < aLen && j < bLen)
        {
            a = A[i];
            b = B[j];
            if (a == b)
            {
                add(output, a);
                i++;
                j++;
            }
            else if (a < b)
            {
                add(output, a);
                i++;
            }
            else
            {
                add(output, b);
                j++;
            }
        }
        //Add any leftovers.  
        for (;i < aLen; i++)
        {
            add(output, A[i]);
        }
        for (;j < bLen; j++)
        {
            add(output, B[j]);
        }
    }

集基本上是相同算法,除非你总是添加每个元素,因此,在二分搜索中没有任何意义可以使用实现 peek 的堆来将其扩展到多个列表,基本规则是始终添加最小的元素,然后向前推进。每个列表的前面都有该元素。

This is not quite the fastest, but it demonstrates the right algorithm.

//List is an output variable, it could be a 
//linked list or an array list or really anything you can add at end to.  

void intersection(int A[], int B[], int aLen, int bLen, List output)
{
    int i = 0;
    int j = 0;
    while (i < aLen && j < bLen)
    {
        a = A[i];
        b = B[j];
        if (a == b)
        {
            add(output, a);
            i++;
            j++;
        }
        else if (a < b)
        {
            i++;
        }
        else
        {
            j++;
        }
    }
}

Above algorithm is O(aLen + bLen)

You can do better, especially when it comes to the problem of intersecting more than 2 lists.

For intersection the basic algorithm is to iterate through all sorted lists to be intersected at the same time. If the heads of all the lists match, move to the next element in all lists and add the head to the intersection. If not, find the maximum element visible, and attempt to find that element in all other lists.

In my code example, I just keep iterating, but since these are sorted lists, if you expect A to be the numbers 1 through 10000 and B to be the set {7777}, you can also binary search to the correct index. Finding the maximum element with multiple lists means using a heap if you want to do it properly.

If you make the binary search change, your worst case will go up to O((aLen + bLen) * (lg(aLen + bLen)), but depending on data, your average case might drastically improve.

The heap change will be necessary when intersecting many sets together, as the above algorithm becomes O(numLists * (total number of elements in all lists)) and can be reduced to O(lg (numLists) * (total number of elements in all lists))

void union(int A[], int B[], int aLen, int bLen, List output)
    {
        int i = 0;
        int j = 0;
        while (i < aLen && j < bLen)
        {
            a = A[i];
            b = B[j];
            if (a == b)
            {
                add(output, a);
                i++;
                j++;
            }
            else if (a < b)
            {
                add(output, a);
                i++;
            }
            else
            {
                add(output, b);
                j++;
            }
        }
        //Add any leftovers.  
        for (;i < aLen; i++)
        {
            add(output, A[i]);
        }
        for (;j < bLen; j++)
        {
            add(output, B[j]);
        }
    }

Union is basically the same algorithm, except you always add every element, and as such, theres no point whatsoever in binary searching. Extending it to multiple lists can be done with a heap that implements peek, basic rule is to always add the smallest element, and step forward in every list that had that element at the front.

独木成林 2025-01-03 15:08:47

假设这些是实际的集合(因为数组与重复项的交集充其量是有问题的),以下代码可能会有所帮助。

首先,一些必需的标头:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

以及一些强制性文档:

// Description:
//     Will give back the union or intersection of two sorted integer
//         sets (only one copy of each element).
//     Caller responsible for freeing memory.
// Input:
//     Union ('u') or intersection (anything else).
//     arr1 and arr2 are pointers to the arrays.
//     sz1 and sz2 are the number of integers in each array.
//     psz3 as the pointer to the variable to receive the
//         size of the returned array.
// Returns:
//     A pointer to the array, or NULL if malloc failed.
//     Memory allocated even if result set is empty, so
//        NULL return indicates ONLY malloc failure.
//     psz3 receives the size of that array, which is
//        zero for an empty set.

然后是正确的函数:

int *arrUnionIntersect ( int type,
    int *arr1, size_t sz1,
    int *arr2, size_t sz2,
    size_t *psz3
) {
    int *arr3, *ptr1, *ptr2;

    *psz3 = 0;

    // If result will be empty, just return dummy block.

    if ((sz1 == 0) && (sz2 == 0))
        return malloc (1);

    // Otherwise allocate space for real result.

    if (type == 'u')
        arr3 = malloc ((sz1 + sz2) * sizeof (*arr1));
    else
        if (sz1 > sz2)
            arr3 = malloc (sz1 * sizeof (*arr1));
        else
            arr3 = malloc (sz2 * sizeof (*arr1));
    if (arr3 == NULL)
        return NULL;

到目前为止,主要是函数的初始化。接下来的位遍历两个输入集,选择添加到结果中的内容。 (在我看来)最好分三个阶段来完成,第一个阶段是当两个输入集仍有剩余时选择一个元素。请注意此处对于并集和交集的不同行为,特别是交集仅在元素位于两个输入集中时才添加元素:

    // Set up pointers for input processing.

    ptr1 = arr1;
    ptr2 = arr2;

    // Phase A: select appropriate number from either, when both
    //    have remaining elements.

    while ((sz1 > 0) && (sz2 > 0)) {
        if (*ptr1 == *ptr2) {
            arr3[(*psz3)++] = *ptr1++;
            sz1--;
            ptr2++;
            sz2--;
            continue;
        }

        // We don't copy for intersect where elements are different.

        if (*ptr1 < *ptr2) {
            if (type == 'u')
                arr3[(*psz3)++] = *ptr1;
            ptr1++;
            sz1--;
            continue;
        }

        if (type == 'u')
            arr3[(*psz3)++] = *ptr2;
        ptr2++;
        sz2--;
    }

其他两个阶段(其中只有一个阶段将针对并集运行,对于交集则不会运行) ,只需从非空输入集中获取剩余项目:

    // Phase B and C are only for unions.

    if (type == 'u') {
        // Phase B: process rest of arr1 if arr2 ran out.

        while (sz1 > 0) {
            arr3[*psz3++] = *ptr1++;
            sz1--;
        }

        // Phase C: process rest of arr2 if arr1 ran out.

        while (sz2 > 0) {
            arr3[*psz3++] = *ptr2++;
            sz2--;
        }
    }

    // Return the union.

    return arr3;
}

和一个测试程序:

int main (void) {
    int x1[] = {1, 3, 5, 7, 9, 11, 13, 15, 17, 19};
    int x2[] = {2, 3, 5, 7, 11, 13, 17, 19};
    size_t i, sz3;
    int *x3;

    x3 = arrUnionIntersect ('u', x1, sizeof(x1)/sizeof(*x1),
        x2, sizeof(x2)/sizeof(*x2), &sz3);
    printf ("union =");
    for (i = 0; i < sz3; i++)
        printf (" %d", x3[i]);
    free (x3);
    printf ("\n");

    x3 = arrUnionIntersect ('i', x1, sizeof(x1)/sizeof(*x1),
        x2, sizeof(x2)/sizeof(*x2), &sz3);
    printf ("intersection =");
    for (i = 0; i < sz3; i++)
        printf (" %d", x3[i]);
    free (x3);
    printf ("\n");

    return 0;
}

及其输出,如预期的那样:

union = 1 2 3 5 7 9 11 13 15 17 19
intersection = 3 5 7 11 13 17 19

Assuming that these are actual sets (since an intersection on arrays with duplicates is problematic at best), the following code may help.

First, some requisite headers:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

and some mandatory documentation:

// Description:
//     Will give back the union or intersection of two sorted integer
//         sets (only one copy of each element).
//     Caller responsible for freeing memory.
// Input:
//     Union ('u') or intersection (anything else).
//     arr1 and arr2 are pointers to the arrays.
//     sz1 and sz2 are the number of integers in each array.
//     psz3 as the pointer to the variable to receive the
//         size of the returned array.
// Returns:
//     A pointer to the array, or NULL if malloc failed.
//     Memory allocated even if result set is empty, so
//        NULL return indicates ONLY malloc failure.
//     psz3 receives the size of that array, which is
//        zero for an empty set.

Then the function proper:

int *arrUnionIntersect ( int type,
    int *arr1, size_t sz1,
    int *arr2, size_t sz2,
    size_t *psz3
) {
    int *arr3, *ptr1, *ptr2;

    *psz3 = 0;

    // If result will be empty, just return dummy block.

    if ((sz1 == 0) && (sz2 == 0))
        return malloc (1);

    // Otherwise allocate space for real result.

    if (type == 'u')
        arr3 = malloc ((sz1 + sz2) * sizeof (*arr1));
    else
        if (sz1 > sz2)
            arr3 = malloc (sz1 * sizeof (*arr1));
        else
            arr3 = malloc (sz2 * sizeof (*arr1));
    if (arr3 == NULL)
        return NULL;

Up until there, it's mostly initialisation for the function. This following bit traverses the two input sets, selecting what gets added to the result. This is best done (in my opinion) as three phases, the first being selection of an element when both input sets still have some remaining. Note the differing behaviour here for unions and intersections, specifically intersections only have the element added if it's in both input sets:

    // Set up pointers for input processing.

    ptr1 = arr1;
    ptr2 = arr2;

    // Phase A: select appropriate number from either, when both
    //    have remaining elements.

    while ((sz1 > 0) && (sz2 > 0)) {
        if (*ptr1 == *ptr2) {
            arr3[(*psz3)++] = *ptr1++;
            sz1--;
            ptr2++;
            sz2--;
            continue;
        }

        // We don't copy for intersect where elements are different.

        if (*ptr1 < *ptr2) {
            if (type == 'u')
                arr3[(*psz3)++] = *ptr1;
            ptr1++;
            sz1--;
            continue;
        }

        if (type == 'u')
            arr3[(*psz3)++] = *ptr2;
        ptr2++;
        sz2--;
    }

The other two phases (of which only one will run for unions, and none for intersections), simply gets the remaining items from the non-empty input set:

    // Phase B and C are only for unions.

    if (type == 'u') {
        // Phase B: process rest of arr1 if arr2 ran out.

        while (sz1 > 0) {
            arr3[*psz3++] = *ptr1++;
            sz1--;
        }

        // Phase C: process rest of arr2 if arr1 ran out.

        while (sz2 > 0) {
            arr3[*psz3++] = *ptr2++;
            sz2--;
        }
    }

    // Return the union.

    return arr3;
}

And a test program:

int main (void) {
    int x1[] = {1, 3, 5, 7, 9, 11, 13, 15, 17, 19};
    int x2[] = {2, 3, 5, 7, 11, 13, 17, 19};
    size_t i, sz3;
    int *x3;

    x3 = arrUnionIntersect ('u', x1, sizeof(x1)/sizeof(*x1),
        x2, sizeof(x2)/sizeof(*x2), &sz3);
    printf ("union =");
    for (i = 0; i < sz3; i++)
        printf (" %d", x3[i]);
    free (x3);
    printf ("\n");

    x3 = arrUnionIntersect ('i', x1, sizeof(x1)/sizeof(*x1),
        x2, sizeof(x2)/sizeof(*x2), &sz3);
    printf ("intersection =");
    for (i = 0; i < sz3; i++)
        printf (" %d", x3[i]);
    free (x3);
    printf ("\n");

    return 0;
}

along with its output, as expected:

union = 1 2 3 5 7 9 11 13 15 17 19
intersection = 3 5 7 11 13 17 19
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文