从数组创建最小堆-2种方法

发布于 2025-02-02 21:50:12 字数 224 浏览 3 评论 0原文

我正在研究一个问题,要从阵列中建造最小的堆。我有2种方法 - 第一个是递归,第二个是使用一个while循环。递归方法通过了在线分级器上的测试,但是While Loop版本似乎不起作用。我在下面的代码中产生了一些随机的压力测试,发现两种方法也给出了不同的答案。

我可以知道第二种方法中有什么错误?问题如下:

输入格式。输入的第一行包含单个整数。下一行包含

I am working on a problem about building a min heap from an array. I have 2 approaches - the first is recursion and the second is using a while loop. The recursion approach passed the tests on the online grader, but the while loop version doesn't seem to work. I generated some random stress tests in my code below and found that the 2 methods gave different answers as well.

May I know what's the mistake in my second method? The question is as follows:

Input Format. The first line of the input contains single integer ????. The next line contains ???? space-separated
integers ????????.

Constraints. 1 ≤ ???? ≤ 100 000; 0 ≤ ????, ???? ≤ ???? − 1; 0 ≤ ????0, ????1,..., ????????−1 ≤ 109. All ???????? are distinct.

Output Format. The first line of the output should contain single integer ???? — the total number of swaps.

???? must satisfy conditions 0 ≤ ???? ≤ 4????. The next ???? lines should contain the swap operations used to convert the array ???? into a heap. Each swap is described by a pair of integers ????, ???? — the 0-based
indices of the elements to be swapped. After applying all the swaps in the specified order the array must become a heap, that is, for each ???? where 0 ≤ ???? ≤ ???? − 1 the following conditions must be true:

  1. If 2???? + 1 ≤ ???? − 1, then ???????? < ????2????+1.
  2. If 2???? + 2 ≤ ???? − 1, then ???????? < ????2????+2.

Note that all the elements of the input array are distinct. Note that any sequence of swaps that has length at most 4???? and after which your initial array becomes a correct heap will be graded as correct.

My code:

# python3

from random import randint

swaps = []

def sift_down(i, n, data):
    min_index = i
    left_child = 2*i + 1
    right_child = 2*i + 2
    if left_child < n and data[left_child] < data[min_index]:
        min_index = left_child
    if right_child < n and data[right_child] < data[min_index]:
        min_index = right_child
    if i != min_index:
        swaps.append([i, min_index])
        data[i], data[min_index] = data[min_index], data[i]
        sift_down(min_index, n, data)

def build_heap(data):
    n = len(data)
    for i in range(n//2, -1, -1):
        sift_down(i, n, data)

    return swaps

# wrong answer using while loop instead of recursion
def build_heap2(data):
    swap = []
    for i in range(len(data)-1, 0, -1):
        current_node = i
        prev_node = i // 2 if i % 2 != 0 else i // 2 - 1

        while data[prev_node] > data[current_node] and current_node != 0:
            swap.append((prev_node, current_node))
            data[prev_node], data[current_node] = data[current_node], data[prev_node]
            current_node = prev_node
            prev_node = current_node // 2 if current_node % 2 != 0 else current_node // 2 - 1

    return swap


def main():
    # n = int(input())
    # data = list(map(int, input().split()))
    # assert len(data) == n
    
    while True:
        n = randint(1, 100000)
        data = []
        data2 = []
        for i in range(n):
            data.append(randint(0, 10^9))
        data2 = data.copy()
        
        swaps = build_heap(data)
        swaps2 = build_heap2(data2)
        
        
        if swaps != swaps2:
            print("recursion")
            print(data[0], len(data), len(swaps))
            print("loop:")
            print(data2[0], len(data2), len(swaps2))
            break
        
        else:
            print("success")
    
    swaps = build_heap(data)

    print(len(swaps))
    for i, j in swaps:
        print(i, j)

if __name__ == "__main__":
    main()

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

我为君王 2025-02-09 21:50:12

您的build_heap2实现了一个不正确的想法。它从树的底部开始(正确),但是在尚未堆积的树的上部,在树的上部发生气泡值 up up 。这不好。它不仅可以报告错误的掉期数量,而且不会总是提供有效的堆。例如,对于[3,1,2,4,0

]在树的底部和父节点的孩子变成堆之后,该父节点的值将其筛选为这些子蜂巢中的任何一个。这是正确的,因为现在的移动值在内移动,该子树已经被堆积了。结果是,这两个小堆的母体现在是有效堆本身的根源。因此,在算法的末尾,根将是有效堆的根。

因此,您需要向下交换(选择最小值的孩子),而不是在树上交换值。

这是更正的版本:

def build_heap(data):
    swap = []
    # We can start at the deepest parent:
    for i in range(len(data) // 2 - 1, -1, -1):
        current_node = i
        
        while True:
            child_node = current_node * 2 + 1
            if child_node >= len(data):
                break
            if child_node + 1 < len(data) and data[child_node + 1] < data[child_node]:
                child_node += 1
            if data[current_node] < data[child_node]:
                break
            # swap the current value DOWN, with the least of both child values
            swap.append((child_node, current_node))
            data[child_node], data[current_node] = data[current_node], data[child_node]
            current_node = child_node
    return swap

Your build_heap2 implements an idea that is not correct. It starts from the bottom of the tree (correct), but then bubbles values up the tree (wrong), in the upper part of the tree that has not been heapified yet. This is not good. Not only can it report the wrong number of swaps, it will not always deliver a valid heap. For instance, for [3, 1, 2, 4, 0] the result after the swaps is still not a heap, as the value 1 ends up as a child of 3.

The purpose is to build little heaps at the bottom of the tree and after the children of a parent node have been turned into heaps, the value in that parent node is sifted down into either of these child-heaps. This is right, as now the moving value is moving within a subtree that is already heapified. The result is that the parent of these two little heaps is now the root of a valid heap itself. And so at the end of the algorithm, the root will be the root of a valid heap.

So instead of swapping values upwards in the tree, you need to swap downwards (choosing the child with the least value).

Here is the corrected version:

def build_heap(data):
    swap = []
    # We can start at the deepest parent:
    for i in range(len(data) // 2 - 1, -1, -1):
        current_node = i
        
        while True:
            child_node = current_node * 2 + 1
            if child_node >= len(data):
                break
            if child_node + 1 < len(data) and data[child_node + 1] < data[child_node]:
                child_node += 1
            if data[current_node] < data[child_node]:
                break
            # swap the current value DOWN, with the least of both child values
            swap.append((child_node, current_node))
            data[child_node], data[current_node] = data[current_node], data[child_node]
            current_node = child_node
    return swap
北陌 2025-02-09 21:50:12

(至少)有两种建造堆的方法。

O(n)解决方案从数据集的中间向后往后开始,从而确保每个连续元素是该点子树的正确根:

def build_heap_down(data):
    n = len(data)
    for subtree in range(n // 2 - 1, -1, -1):
        sift_down(subtree, n, data)

另一个解决方案,即O(n log n),只需添加每个解决方案元素依次转到一个更大的堆:

def build_heap_up(data):
    for new_element in range(1, n):
        sift_up(new_element, data)

由于build_heap_up()在最坏的情况下是log-linear(我相信我认为是反向输入),因此它可能无法满足您的作业需求,这将线性绑定在掉期数量上。尽管如此,一些实验还是值得做的。也许这就是这项任务的重点。

def sift_up(elt, data):
    while elt > 0:
        parent = (elt - 1) // 2
        if data[parent] <= data[elt]: return
        swap(parent, elt, data)
        elt = parent

def sift_down(elt, limit, data):
    while True:
        kid = 2 * elt + 1
        if kid >= limit: return
        if kid + 1 < limit and data[kid + 1] < data[kid]: kid += 1
        if data[elt] <= data[kid]: return
        swap(elt, kid, data)
        elt = kid

这里的关键洞察力是sift_upsift_down都要求他们与之合作的数组是堆,除了被筛选的元素。 sift_down与从筛选元素到末端的数组一起使用,因此在整个数组上正确执行此操作需要向后工作。 sift_up从开始到筛选元素的数组使用,因此迭代必须向前锻炼。

据我所知,您的build_heap dis build_heap_down尽管它使用递归,但它与上面的循环一样(以及 @trincot 的版本相同(以及 tail call call消除 。 (某些语言会自动执行此程序转换,但是Python不是其中之一。)

您的build_heap2build> build_heap_up的不正确版本,因为它向后工作,而不是努力工作。这很容易解决。但是不要指望它会产生相同的堆,更不用说掉期列表了。可以从给定的数字列表中构建许多可能的堆,这就是为什么可以找到build_heap的O(n)算法的原因,而不是sort> sort

There are (at least) two ways to build a heap.

The O(N) solution works backwards from the middle of the dataset towards the beginning, ensuring that each successive element is the correct root of the subtree at that point:

def build_heap_down(data):
    n = len(data)
    for subtree in range(n // 2 - 1, -1, -1):
        sift_down(subtree, n, data)

The other solution, which is O(N log N), just adds each element in turn to a successively larger heap:

def build_heap_up(data):
    for new_element in range(1, n):
        sift_up(new_element, data)

Since build_heap_up() is log-linear in the worst case (which I believe is reverse-sorted input), it probably doesn't satisfy the needs of your assignment, which impose a linear bound on the number of swaps. Still, some experimentation is worth doing. Perhaps that's the point of this assignment.

def sift_up(elt, data):
    while elt > 0:
        parent = (elt - 1) // 2
        if data[parent] <= data[elt]: return
        swap(parent, elt, data)
        elt = parent

def sift_down(elt, limit, data):
    while True:
        kid = 2 * elt + 1
        if kid >= limit: return
        if kid + 1 < limit and data[kid + 1] < data[kid]: kid += 1
        if data[elt] <= data[kid]: return
        swap(elt, kid, data)
        elt = kid

The key insight here is that both sift_up and sift_down require that the array they are working with be a heap except for the element being sifted. sift_down works with the array from the sifted element to the end, so doing it correctly on the entire array requires working backwards. sift_up works with the array from the beginning to the sifted element, so the iteration has to work forwards.

Your build_heap does build_heap_down, as far as I can see correctly. Although it uses recursion, it does the same thing as my loop above (and the version from @trincot); recursion at the very end of a function can always be turned into a simple loop using tail call elimination. (Some languages automatically perform this program transformation, but Python isn't one of them.)

Your build_heap2 is an incorrect version of build_heap_up because it works backwards instead of working forwards. That's easy to fix. But don't expect it to produce the same heap, much less the same list of swaps. There are many possible heaps which can be built from a given list of numbers, which is why it's possible to find an O(N) algorithm for build_heap and not for sort.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文