python集合运算的时间复杂度?
Big O 表示法中每个 Python 集合运算的时间复杂度是多少?
我正在使用 Python 的 设置类型 对大量项目进行操作。我想知道每个操作的性能将如何受到集合大小的影响。例如,添加,以及成员资格测试:
myset = set()
myset.add('foo')
'foo' in myset
谷歌搜索没有找到任何资源,但仔细考虑 Python 集合实现的时间复杂度似乎是合理的。
如果存在的话,像 this 这样的链接会很棒。如果没有这样的东西存在,那么也许我们可以解决它?
计算所有集合运算的时间复杂度的额外分数。
What is the the time complexity of each of python's set operations in Big O notation?
I am using Python's set type for an operation on a large number of items. I want to know how each operation's performance will be affected by the size of the set. For example, add, and the test for membership:
myset = set()
myset.add('foo')
'foo' in myset
Googling around hasn't turned up any resources, but it seems reasonable that the time complexity for Python's set implementation would have been carefully considered.
If it exists, a link to something like this would be great. If nothing like this is out there, then perhaps we can work it out?
Extra marks for finding the time complexity of all set operations.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
根据 Python wiki:时间复杂度,set 被实现为 < href="https://en.wikipedia.org/wiki/Hash_table" rel="noreferrer">哈希表。因此,您可以期望平均 O(1) 查找/插入/删除。除非哈希表的负载因子太高,否则您将面临冲突和 O(n)。
PS,出于某种原因,他们声称删除操作的时间复杂度为 O(n),这看起来像是打字错误。
PPS 这对于 CPython 来说是正确的,pypy 是一个 不同的故事。
According to Python wiki: Time complexity, set is implemented as a hash table. So you can expect to lookup/insert/delete in O(1) average. Unless your hash table's load factor is too high, then you face collisions and O(n).
P.S. for some reason they claim O(n) for delete operation which looks like a mistype.
P.P.S. This is true for CPython, pypy is a different story.
其他答案没有讨论集合上的两个关键操作:并集和交集。在最坏的情况下,如果集合中具有相同哈希值的元素不多,则并集将花费 O(n+m),而交集将花费 O(min(x,y))。常见操作的时间复杂度列表可以在这里找到:https://wiki.python.org/moin/时间复杂度
The other answers do not talk about 2 crucial operations on sets: Unions and intersections. In the worst case, union will take O(n+m) whereas intersection will take O(min(x,y)) provided that there are not many element in the sets with the same hash. A list of time complexities of common operations can be found here: https://wiki.python.org/moin/TimeComplexity
in
中的操作应该独立于容器的大小,即。 O(1)——给定一个最优的哈希函数。对于 Python 字符串来说,这应该几乎是正确的。散列字符串始终至关重要,Python 在这方面应该很聪明,因此您可以期待接近最佳的结果。The operation
in
should be independent from the size of the container, ie. O(1) -- given an optimal hash function. This should be nearly true for Python strings. Hashing strings is always critical, Python should be clever there and thus you can expect near-optimal results.Python 中的集合类型基本上是作为 HashTable 实现的。
上面有很多很好的答案,我只想指出一个遗漏的点:
我认为还有一件事是哈希表的大小是预定义的,添加超出当前容量的元素会触发哈希表的大小调整。
对于复杂性部分(哈希表):
摊销:
单独添加的时间复杂度为 O(1),因为调整大小并不那么频繁。
最坏情况:
O(n) 调整大小涉及将所有元素重新散列到更大的表中,这是一个线性操作。
Set type in Python is basically implemented as a HashTable.
There're so many great answers above, I'm just gonna state a point that is missing:
I think one more thing is that the size of hashtable is predefined, and adding elements beyond the current capacity triggers a resize of the hash table.
For the complexity part (HashTable):
Amortized:
O(1) for individual additions because resizing is not as often.
Worst case:
O(n) resizing involves rehashing all elements into a larger table, which is a linear operation.