当前位置：文江博客话题详情

遍历堆化列表

发布于 2024-12-12 23:43:25 字数 2440 浏览 3 评论 0原文

我正在进行蒙特卡罗模拟。作为此任务的一部分，我生成在间隔 (0,100) 上均匀分布的样本。

generate = lambda:uniform(0,100)

当所有最接近的生成点对满足条件时，迭代停止。

check = lambda a,b: True if (ba)<5 else False

我需要有一些结构来有效地保留所有生成的点，并能够按升序遍历它们来执行 < code>check 所有后续对。

Python中有一个 heapq 模块，它支持非常有效的堆结构。我决定使用它。

我遇到了一个问题。我发现这个模块不支持遍历过程。我发现按升序访问堆值的唯一方法是使用 heapq.heappop。但它会从堆中删除这些值。

我找到了解决方法，只需将堆对象复制到新堆对象中，然后使用 heappop 对新对象进行迭代。但我认为每次迭代都将整个结构复制到内存中并不是很有效。

我可以通过其他方式更有效地完成我想做的事情吗？

用于说明的简化代码。

import heapq
from random import uniform
from itertools import tee, izip, count
from copy import copy


def pairwise(iterable): #get values from iterator in pairs
    a, b = tee(iterable)
    next(b, None)
    return izip(a, b)


check = lambda a,b: True if (b-a)<5 else False
generate = lambda: uniform(0,100)


def iterate_heap(heap):
    heap = copy(heap) #Here I have to copy the heap to be able to traverse
    try:
        while True:
            yield heapq.heappop(heap)
    except IndexError:
        return


def trial():
    items = []

    for i in count():
        item = generate()
        heapq.heappush(items, item)

        it = iterate_heap(items)
        it = pairwise(it)

        if i>0 and all(check(a,b) for a,b in it): #if i==0 then 'it' returns no values and 'all' returns True
            return i

print "The solution is reached. It took %d iterations." % trial()

paiwise 函数来自这里。

更新： 在此使用 heappop 的实现中，每次迭代的复杂度为 O(n*log(n))：

复制堆：O(n)

添加堆中的新值：O(log(n))

遍历：n 个元素 * 弹出每个元素时O(log(n))来自堆的值 -> O(n*log(n))。

结果：O(n+log(n)+n*log(n)) = O(n*log(n)

但我预计遍历时间为O(n) code>，因此最终的复杂度将为 O(n)

顺便说一句，如果我们仅使用排序列表，则需要在每次添加时对列表进行排序，因此 O(n)。 *log(n))，但是遍历将是n*O(1) -> O(n)，因此，最终的复杂度仍然是O(n*log(n))

。找到了一个解决方案，那就是使用 bisect 模块。找到添加的位置将是 O(log(n))。 n) （因为实现了所有值，最终的复杂度是 O(n) 。

插入到位后必须移动）。因此是在 Python 中使用堆来解决此任务的一种方法。

原文

I'm making a Monte-Carlo simulation. And as a part of this task I generate samples uniformly distributed over an interval (0,100).

generate = lambda: uniform(0,100)

The iterations stop when all the closest generated points' pairs meet the criteria.

check = lambda a,b: True if (b-a)<5 else False

I need to have some structure to effectively keep all the generated points and be able to go through them in ascending order to perform check on all the subsequent pairs.

There is a heapq module in Python which supports a very effective heap structure. And I decided to use it.

I faced a problem. I have found no traversal procedure supported by this module. The only way I found to access the values of the heap in ascending order is to use heapq.heappop. But it deletes the values from the heap.

I found the workaround for this and just copied the heap object into the new one and iterated with heappop over the new one. But I don't think it's quite effective to copy the whole structure in memory one every iteration.

Is there any other way I can go to do what I'm trying to do more effectively?

The simplified code for illustration.

import heapq
from random import uniform
from itertools import tee, izip, count
from copy import copy


def pairwise(iterable): #get values from iterator in pairs
    a, b = tee(iterable)
    next(b, None)
    return izip(a, b)


check = lambda a,b: True if (b-a)<5 else False
generate = lambda: uniform(0,100)


def iterate_heap(heap):
    heap = copy(heap) #Here I have to copy the heap to be able to traverse
    try:
        while True:
            yield heapq.heappop(heap)
    except IndexError:
        return


def trial():
    items = []

    for i in count():
        item = generate()
        heapq.heappush(items, item)

        it = iterate_heap(items)
        it = pairwise(it)

        if i>0 and all(check(a,b) for a,b in it): #if i==0 then 'it' returns no values and 'all' returns True
            return i

print "The solution is reached. It took %d iterations." % trial()

paiwise function is from recipe from here.

Update:
In this implementation with heappop the complexity on each iteration is O(n*log(n)):

Copying heap: O(n)

Adding a new value to the heap: O(log(n))

Traversing: n elements * O(log(n)) on popping each value from heap -> O(n*log(n)).

Result: O(n+log(n)+n*log(n)) = O(n*log(n)

But I expect the traversal to be O(n), so the resultant complexity would be O(n).

By the way, if we use just sorted list, we would need to sort the list on each adding, so O(n*log(n)), but the traversal would be n*O(1) -> O(n). So, the resultant complexity is still O(n*log(n)).

I have found a solution. It's to use bisect module. Finding the place to add would be O(log(n)). Adding to the list is of O(n) (because of the implementation all the values after the insertion in place have to be moved). Traversing is O(n). So, the resultant complexity is O(n).

Still, I wounder, if there is a way to solve this task using heaps in Python.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

风情万种。 2024-12-19 23:44:36

我创建了一个 Iterator 类，它将执行最小堆的惰性有序遍历。它具有以下优点：

不需要原始堆的副本
不修改原始堆
如果提前停止，延迟迭代会更有效

为了跟踪迭代的下一个项目，我实际上只是使用了另一个堆 self .next_items。

import heapq

class HeapIter:

    def __init__(self, heap):
        self.original_heap = heap
        self.next_items = []
        if len(self.original_heap) > 0:
            self.next_items.append((self.original_heap[0], 0))

    def current_element(self):
        if len(self.next_items) == 0:
            return None
        return self.next_items[0][0]

    def next(self):
        if len(self.next_items) == 0:
            return None
        next_elem, next_index = heapq.heappop(self.next_items)
        child_1 = 2 * next_index + 1
        child_2 = child_1 + 1
        if child_1 < len(self.original_heap):
            heapq.heappush(self.next_items, (self.original_heap[child_1], child_1))
        if child_2 < len(self.original_heap):
            heapq.heappush(self.next_items, (self.original_heap[child_2], child_2))
        return next_elem

I created an Iterator class that will perform a lazy in-order traversal of a min heap. It has the following advantages:

Doesn't require a copy of the original heap
Doesn't modify the original heap
Lazy iteration is more efficient if stopping early

To keep track the next items for iteration, I actually just used another heap self.next_items.

import heapq

class HeapIter:

    def __init__(self, heap):
        self.original_heap = heap
        self.next_items = []
        if len(self.original_heap) > 0:
            self.next_items.append((self.original_heap[0], 0))

    def current_element(self):
        if len(self.next_items) == 0:
            return None
        return self.next_items[0][0]

    def next(self):
        if len(self.next_items) == 0:
            return None
        next_elem, next_index = heapq.heappop(self.next_items)
        child_1 = 2 * next_index + 1
        child_2 = child_1 + 1
        if child_1 < len(self.original_heap):
            heapq.heappush(self.next_items, (self.original_heap[child_1], child_1))
        if child_2 < len(self.original_heap):
            heapq.heappush(self.next_items, (self.original_heap[child_2], child_2))
        return next_elem

回复收藏 0 原文

甜是你 2024-12-19 23:44:28

郑重声明，本例中正确的数据结构是 B 树。有一个实现：

 from blist import sortedlist

运行时复杂度尽可能低：O(n*logn)构造列表，迭代 O(n)。

For the record, the right data structure in this case is a B-Tree. There is a implementation:

 from blist import sortedlist

The runtime complexity is as low as it gets: O(n*logn) to construct the list, O(n) to iterate.

回复收藏 0 原文

记忆之渊 2024-12-19 23:44:18

我做了一些效率计算。

使用 bisect< 可以获得最佳性能/a> 模块：
在我的计算机（Python 2.7）上，列表中间插入 10000 次的时间为 0.037 秒。

使用来自 blist 模块时钟的 sortedlist对于相同数量的插入，0.287 秒。

并使用传统的list，并在每个append时钟2.796秒后应用sort。（现在 Python 中使用了 Timsort 算法，人们认为它在接近排序的列表上非常有效；但事实证明它不如使用 bisect 那么有效）。

我用来进行这些计算的代码：

import bisect
import timeit
import __main__
import blist

N = 10000 #Number of executions
L = 1000 #Length of initial list

def test_f_bisect(a):
    bisect.insort_right(a,500)


def test_f_list_sort(a):
    a.append(500)
    a.sort()


test_f_blist_init = '''
from __main__ import test_f_blist
import blist
a = blist.sortedlist(range({L}))
'''.format(L=L)
def test_f_blist(a):
    a.add(500)


names = dir(__main__)
for name in names:
    attr = getattr(__main__,name)
    if hasattr(attr,'__call__'):
        if name.startswith('test_f_'):
            init_name = name + '_init'
            if hasattr(__main__, init_name):
                init = getattr(__main__,init_name)
            else:
                init = 'from __main__ import {name}; a = list(range({L}))'.format(name=name, L=L)
            t = timeit.Timer(stmt='{name}(a)'.format(name=name),
                             setup=init)

            time = t.timeit(N)
            print('{name}: {time}'.format(name=name,time=time))

I have made some efficiency calculations.

The best performance is achieved with using bisect module:
10000 insertions in the middle of the list clocked 0.037 sec on my computer (Python 2.7).

With using sortedlist from blist module clocked 0.287 sec for the same amount of insertions.

And using a traditional list with sort applyed after each append clocked 2.796 sec. (Now Timsort algorithm is used in Python and it is argued to be very efficient on nearly sorted list; still it turns out to be not that efficient as using bisect).

The code I used to make these calculations:

import bisect
import timeit
import __main__
import blist

N = 10000 #Number of executions
L = 1000 #Length of initial list

def test_f_bisect(a):
    bisect.insort_right(a,500)


def test_f_list_sort(a):
    a.append(500)
    a.sort()


test_f_blist_init = '''
from __main__ import test_f_blist
import blist
a = blist.sortedlist(range({L}))
'''.format(L=L)
def test_f_blist(a):
    a.add(500)


names = dir(__main__)
for name in names:
    attr = getattr(__main__,name)
    if hasattr(attr,'__call__'):
        if name.startswith('test_f_'):
            init_name = name + '_init'
            if hasattr(__main__, init_name):
                init = getattr(__main__,init_name)
            else:
                init = 'from __main__ import {name}; a = list(range({L}))'.format(name=name, L=L)
            t = timeit.Timer(stmt='{name}(a)'.format(name=name),
                             setup=init)

            time = t.timeit(N)
            print('{name}: {time}'.format(name=name,time=time))

回复收藏 0 原文