什么类型的可变对象集合可以让我快速删除 python 中的项目?

发布于 2024-10-01 05:25:21 字数 451 浏览 0 评论 0原文

假设我已经分析了我的程序,并且绝大多数运行时间都花在“列表”对象的“删除”方法上。程序操作的是集合的集合,并且集合不需要排序。在 python 中实现这些集合(最好使用标准 python 集合)的最直接方法是什么,以便当 collection 是外部时 collection.remove(item) 都很便宜Collection 和 item 是一个内部集合,并且当 collection 是一个内部集合并且 item 只是一个不可变对象时。

这里使用集合的问题是集合不能包含可变集合,因此内部集合必须是冻结集合,但删除项目不再那么便宜。

到目前为止,我遇到的最好的解决方案是由某人作为答案提出的,显然不久后就被删除了。他们建议使用字典。这可行,但是你必须为每个项目生成任意 id,所以这有点尴尬。另一种选择是使用链表,但这也会很尴尬,因为链表不是标准库的一部分。

Suppose I have profiled my program, and the vast majority of runtime is spent in method 'remove' of 'list' objects. The program manipulates a collection of collections, and the collections do not need to be ordered. What would be the most straightforward way to implement these collections in python (preferably using standard python collections) so that collection.remove(item) is inexpensive both when collection is the outer collection and item is an inner collection and when collection is an inner collection and item is just an immutable object.

The problem with using sets here is that sets cannot contain mutable collections, so the inner sets would have to be frozensets, but then removing items is no longer so cheap.

The best solution I've come upon so far was suggested by someone as an answer here that apparently was deleted shortly after. They suggested using a dict. This would work, but you would have to generate arbitrary id's for each item then, so it's a bit awkward. Another alternative is to used a linked list, but that would be awkward too, since linked lists aren't part of the standard library.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

贪了杯 2024-10-08 05:25:21

如果您可以接受定义为身份的相等性,则可以创建可散列列表子类型并将它们用作集合成员以进行快速访问/删除:

class hlist(list):
"Hashable list"
    def __hash__(self):
        return id(self)
    def __eq__(self, other):
        return self is other
    def __ne__{self, other}:
        return self is not other

in1 = hlist([1,2,3])
in2 = hlist([4,5,6])
outer = set([in1, in2])

If you can live with equality defined as identity, you can create a hashable list subtype and use these as set members for fast access/removal:

class hlist(list):
"Hashable list"
    def __hash__(self):
        return id(self)
    def __eq__(self, other):
        return self is other
    def __ne__{self, other}:
        return self is not other

in1 = hlist([1,2,3])
in2 = hlist([4,5,6])
outer = set([in1, in2])
征棹 2024-10-08 05:25:21

他们建议使用字典。这可行,但你必须为每个项目生成任意 id,所以有点尴尬。

你通过实例删除它们吗?使用 dict 方法,您总是可以使用 id() 作为它们的“任意”ID?

一个 dict 用于以 id() 作为键的组,内部 dict 用于个人的 id()。另一个全局 dict 包含以 id() 作为键的个体。

尚不清楚一个人是否可以在多个组中...如果是这样,您需要在删除之前验证该人是否在任何组中。

They suggested using a dict. This would work, but you would have to generate arbitrary id's for each item then, so it's a bit awkward.

You delete them by instance? Using a dict approach, you can always use id() as their "arbitrary" ID?

One dict for groups with their id() as key, inner dict for invidual's id(). And another global dict with individuals with their id() as key.

It's not clear if an individual can be in multiple groups... If so, you would need to verify if the invidual is in any group before deleting it.

峩卟喜欢 2024-10-08 05:25:21

Dictionary 是本例中您想要的集合,因为它的查找和删除时间复杂度为 O(1)。当您想要添加/删除时,您会产生一个成本,即为每个对象生成一个密钥,但它会比扫描列表的 O(n) 方法快得多。在这种情况下,为您的对象生成密钥是正确的。如果您有一个主键(它们来自数据库吗?),它将否定属性查找的哈希函数,并且您将获得近乎完美的性能。

您似乎认为在这种情况下使用字典作为数据结构是一件坏事 - 事实并非如此。字典的目的是快速查找集合中的项目。这就是你所需要的,使用它。

Dictionary is the collection you want in this case because it has O(1) find and delete. There is a cost you will incur, which is generating a key for each object when you want to add/remove, but it'll be significantly faster than the O(n) approach of scanning a list. Generating a key for your objects is correct in this situation. If you have a primary key (did they come from a DB?) that will negate the hash function to a property lookup, and you'll achieve near perfect performance.

You seem to think that using a dictionary as a data structure in this case is a bad thing - it isn't at all. The purpose of a dictionary is to quickly find items in a collection. This is what you need, use it.

仲春光 2024-10-08 05:25:21

如果您花费大量时间从列表中删除元素,也许您应该考虑过滤它?换句话说。创建一个大的初始列表,然后后续的生成器使用列表中的元素。

If you are spending a lot of time remove-ing elements from a list, perhaps you should consider filtering it instead? In other words. make a large initial list and then subsequent generators consuming elements in the list.

却一份温柔 2024-10-08 05:25:21

这可能不完全是您所要求的,但是 collections.deque< /code>可能会满足您的一些要求:

双端队列支持线程安全、内存高效的从双端队列的任意一侧追加和弹出,在任一方向上具有大致相同的 O(1) 性能。

It's perhaps not exactly what you're asking for, but collections.deque might meet some of your requirements:

Deques support thread-safe, memory efficient appends and pops from either side of the deque with approximately the same O(1) performance in either direction.

伊面 2024-10-08 05:25:21

为什么不拥有诸如集合的主列表之类的东西,然后再拥有另一个包含您想要跟踪的集合的列表索引的集合?当然,这可能需要一些额外的工作,但是您应该能够将其抽象为一个类。

Why not have something like a master list of sets and then another set that contains the indices to the list for the set you want to keep track of? Sure it might be a little extra work, but you should be able to abstract it out into a class.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文