什么类型的可变对象集合可以让我快速删除 python 中的项目?
假设我已经分析了我的程序,并且绝大多数运行时间都花在“列表”对象的“删除”方法上。程序操作的是集合的集合,并且集合不需要排序。在 python 中实现这些集合(最好使用标准 python 集合)的最直接方法是什么,以便当 collection 是外部时 collection.remove(item) 都很便宜Collection 和 item 是一个内部集合,并且当 collection 是一个内部集合并且 item 只是一个不可变对象时。
这里使用集合的问题是集合不能包含可变集合,因此内部集合必须是冻结集合,但删除项目不再那么便宜。
到目前为止,我遇到的最好的解决方案是由某人作为答案提出的,显然不久后就被删除了。他们建议使用字典。这可行,但是你必须为每个项目生成任意 id,所以这有点尴尬。另一种选择是使用链表,但这也会很尴尬,因为链表不是标准库的一部分。
Suppose I have profiled my program, and the vast majority of runtime is spent in method 'remove' of 'list' objects. The program manipulates a collection of collections, and the collections do not need to be ordered. What would be the most straightforward way to implement these collections in python (preferably using standard python collections) so that collection.remove(item) is inexpensive both when collection is the outer collection and item is an inner collection and when collection is an inner collection and item is just an immutable object.
The problem with using sets here is that sets cannot contain mutable collections, so the inner sets would have to be frozensets, but then removing items is no longer so cheap.
The best solution I've come upon so far was suggested by someone as an answer here that apparently was deleted shortly after. They suggested using a dict. This would work, but you would have to generate arbitrary id's for each item then, so it's a bit awkward. Another alternative is to used a linked list, but that would be awkward too, since linked lists aren't part of the standard library.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
如果您可以接受定义为身份的相等性,则可以创建可散列列表子类型并将它们用作集合成员以进行快速访问/删除:
If you can live with equality defined as identity, you can create a hashable list subtype and use these as set members for fast access/removal:
你通过实例删除它们吗?使用
dict
方法,您总是可以使用id()
作为它们的“任意”ID?一个
dict
用于以id()
作为键的组,内部dict
用于个人的id()
。另一个全局dict
包含以id()
作为键的个体。尚不清楚一个人是否可以在多个组中...如果是这样,您需要在删除之前验证该人是否在任何组中。
You delete them by instance? Using a
dict
approach, you can always useid()
as their "arbitrary" ID?One
dict
for groups with theirid()
as key, innerdict
for invidual'sid()
. And another globaldict
with individuals with theirid()
as key.It's not clear if an individual can be in multiple groups... If so, you would need to verify if the invidual is in any group before deleting it.
Dictionary 是本例中您想要的集合,因为它的查找和删除时间复杂度为 O(1)。当您想要添加/删除时,您会产生一个成本,即为每个对象生成一个密钥,但它会比扫描列表的 O(n) 方法快得多。在这种情况下,为您的对象生成密钥是正确的。如果您有一个主键(它们来自数据库吗?),它将否定属性查找的哈希函数,并且您将获得近乎完美的性能。
您似乎认为在这种情况下使用字典作为数据结构是一件坏事 - 事实并非如此。字典的目的是快速查找集合中的项目。这就是你所需要的,使用它。
Dictionary is the collection you want in this case because it has O(1) find and delete. There is a cost you will incur, which is generating a key for each object when you want to add/remove, but it'll be significantly faster than the O(n) approach of scanning a list. Generating a key for your objects is correct in this situation. If you have a primary key (did they come from a DB?) that will negate the hash function to a property lookup, and you'll achieve near perfect performance.
You seem to think that using a dictionary as a data structure in this case is a bad thing - it isn't at all. The purpose of a dictionary is to quickly find items in a collection. This is what you need, use it.
如果您花费大量时间从列表中
删除
元素,也许您应该考虑过滤它?换句话说。创建一个大的初始列表,然后后续的生成器使用列表中的元素。If you are spending a lot of time
remove
-ing elements from a list, perhaps you should consider filtering it instead? In other words. make a large initial list and then subsequent generators consuming elements in the list.这可能不完全是您所要求的,但是
collections.deque< /code>
可能会满足您的一些要求:
It's perhaps not exactly what you're asking for, but
collections.deque
might meet some of your requirements:为什么不拥有诸如
集合
的主列表
之类的东西,然后再拥有另一个包含您想要跟踪的集合的列表索引的集合?当然,这可能需要一些额外的工作,但是您应该能够将其抽象为一个类。Why not have something like a master
list
ofsets
and then another set that contains the indices to the list for the set you want to keep track of? Sure it might be a little extra work, but you should be able to abstract it out into a class.