就地合并,无需外部存储

发布于 2025-01-06 21:23:18 字数 168 浏览 1 评论 0原文

我想将两个具有排序值的数组合并为一个。由于两个源数组都存储为大型数组的后续部分,我想知道您是否知道如何将它们合并到大型存储中。意思是就地合并。

我发现的所有方法都需要一些外部存储。它们通常需要 sqrt(n) 临时数组。没有它有没有有效的方法?

我正在使用 C#。也欢迎其他语言。提前致谢!

I want to merge two arrays with sorted values into one. Since both source arrays are stored as succeeding parts of a large array, I wonder, if you know a way to merge them into the large storage. Meaning inplace merge.

All methods I found, need some external storage. They often require sqrt(n) temp arrays. Is there an efficient way without it?

I m using C#. Other languages welcome also. Thanks in advance!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

难以启齿的温柔 2025-01-13 21:23:18

AFAIK,如果不显着增加必要的比较和元素移动次数,合并两个(甚至已排序)数组就无法就地工作。请参阅:合并排序。然而,存在阻塞变体,它们能够通过利用长度 sqrt(n) 的临时数组对长度为 n 的列表进行排序 - 正如您所写 - 通过仍然保持操作数量相当低。它不错 - 但它也不是“什么都没有”,显然是你能得到的最好的。

对于实际情况,如果您负担得起,您最好使用临时数组来合并列表。

AFAIK, merging two (even sorted) arrays does not work inplace without considerably increasing the necessary number of comparisons and moves of elements. See: merge sort. However, blocked variants exist, which are able to sort a list of length n by utilizing a temporary arrays of lenght sqrt(n) - as you wrote - by still keeping the number of operations considerably low.. Its not bad - but its also not "nothing" and obviously the best you can get.

For practical situations and if you can afford it, you better use a temporary array to merge your lists.

∞觅青森が 2025-01-13 21:23:18

如果这些值存储为较大数组的连续部分,您只需对数组进行排序,然后删除相等的连续值。

void  SortAndDedupe(Array<T> a)
{
    // Do an efficient in-place sort
    a.Sort();
    // Now deduplicate
    int lwm = 0; // low water mark
    int hwm = 1; // High water mark
    while(hwm < a.length)
    {
        // If the lwm and hwm elements are the same, it is a duplicate entry.
        if(a[lwm] == a[hwm])
        {
            hwm++;
        }else{
            // Not a duplicate entry - move the lwm up
            // and copy down the hwm element over the gap.
            lwm++;
            if(lwm < hwm){
                a[lwm] = a[hwm];
            }
            hwm++;
        }
    }
    // New length is lwm
    // number of elements removed is (hwm-lwm-1)
}

在您认为这太慢之前,请实施它并对其进行分析。这应该需要大约十分钟。

编辑:这当然可以通过使用不同的排序而不是内置排序来改进,例如快速排序、堆排序或平滑排序,具体取决于哪种排序在实践中提供更好的性能。请注意,硬件架构问题意味着实际性能比较可能与大 O 分析的结果有很大不同。

实际上,您需要在实际硬件/操作系统平台上使用不同的排序算法对其进行分析。

注意:我并不是试图在这个答案中给出一个学术答案,我试图给出一个实用的答案,假设你正在尝试解决一个真正的问题。

If the values are stored as succeeding parts of a larger array, you just want to sort the array, then remove consecutive values which are equal.

void  SortAndDedupe(Array<T> a)
{
    // Do an efficient in-place sort
    a.Sort();
    // Now deduplicate
    int lwm = 0; // low water mark
    int hwm = 1; // High water mark
    while(hwm < a.length)
    {
        // If the lwm and hwm elements are the same, it is a duplicate entry.
        if(a[lwm] == a[hwm])
        {
            hwm++;
        }else{
            // Not a duplicate entry - move the lwm up
            // and copy down the hwm element over the gap.
            lwm++;
            if(lwm < hwm){
                a[lwm] = a[hwm];
            }
            hwm++;
        }
    }
    // New length is lwm
    // number of elements removed is (hwm-lwm-1)
}

Before you conclude that this will be too slow, implement it and profile it. That should take about ten minutes.

Edit: This can of course be improved by using a different sort rather than the built-in sort, e.g. Quicksort, Heapsort or Smoothsort, depending on which gives better performance in practice. Note that hardware architecture issues mean that the practical performance comparisons may very well be very different from the results of big O analysis.

Really you need to profile it with different sort algorithms on your actual hardware/OS platform.

Note: I am not attempting in this answer to give an academic answer, I am trying to give a practical one, on the assumption you are trying to solve a real problem.

最好是你 2025-01-13 21:23:18

不关心外部存储。 sqrt(n) 或更大的值不会损害您的性能。您只需确保存储是池化的。特别是对于大数据。特别是对于将它们合并到循环中。否则,GC 将会承受压力并消耗相当一部分 CPU 时间/内存带宽。

Dont care about external storage. sqrt(n) or even larger should not harm your performance. You will just have to make sure, the storage is pooled. Especially for large data. Especially for merging them in loops. Otherwise, the GC will get stressed and eat up a considerable part of your CPU time / memory bandwidth.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文