允许在迭代期间删除的自定义字典
根据 Lennart Regebro 的回答进行更新
假设您迭代字典,有时需要删除一个元素。以下是非常有效的:
remove = []
for k, v in dict_.items():
if condition(k, v):
remove.append(k)
continue
# do other things you need to do in this loop
for k in remove:
del dict_[k]
这里唯一的开销是构建要删除的键列表;除非它与字典大小相比变得很大,否则这不是问题。然而,这种方法需要一些额外的编码,所以它不是很受欢迎。
流行的字典理解方法:
dict_ = {k : v for k, v in dict_ if not condition(k, v)}
for k, v in dict_.items():
# do other things you need to do in this loop
产生完整的字典副本,因此如果字典变大或经常调用包含函数,则存在愚蠢的性能影响的风险。
更好的方法是仅复制键而不是整个字典:(
for k in list(dict_.keys()):
if condition(k, dict_[k]):
del dict_[k]
continue
# do other things you need to do in this loop
请注意,所有代码示例均采用 Python 3,因此 keys()
、items()
返回视图,而不是副本。)
在大多数情况下,它不会对性能造成太大影响,因为即使是检查最简单的条件(更不用说您在循环中执行的其他操作)的时间通常也比检查的时间长将一把钥匙添加到列表中。
不过,我想知道是否可以使用允许在迭代时进行删除的自定义字典来避免这种情况:
for k, v in dict_.items():
if condition(k, v):
del dict_[k]
continue
# do other things you need to do in this loop
也许迭代器总是可以向前看,这样当调用 __next__ 时,迭代器就知道要到哪里去甚至不需要查看当前元素(它只需要在第一次到达该元素时查看该元素)。如果没有下一个元素,迭代器可以设置一个标志,每当再次调用 __next__
时,该标志都会引发 StopIteration
异常。
如果迭代器尝试前进的元素结果被删除,则可以引发异常;当多个迭代同时进行时,不需要支持删除。
这种方法有什么问题吗?
一个问题是,与现有的 dict 相比,我不确定它是否可以在没有任何材料开销的情况下完成;否则,使用 list(dict_)
方法会更快!
更新:
我尝试了所有版本。我不报告时间,因为它们显然非常依赖于具体情况。但似乎可以肯定地说,在许多情况下,最快的方法可能是 list(dict_)
。毕竟,如果你仔细想想,复制是最快的操作,它随着列表的大小线性增长;几乎任何其他开销,只要它也与列表大小成正比,都可能会更大。
我真的很喜欢所有的想法,但由于我只需要选择一个,我接受上下文管理器解决方案,因为它允许使用字典作为正常或“增强”,只需很少的代码更改。
UPDATED based on Lennart Regebro's answer
Suppose you iterate through a dictionary, and sometimes need to delete an element. The following is very efficient:
remove = []
for k, v in dict_.items():
if condition(k, v):
remove.append(k)
continue
# do other things you need to do in this loop
for k in remove:
del dict_[k]
The only overhead here is building the list of keys to remove; unless it grows large compared to the dictionary size, it's not an issue. However, this approach requires some extra coding, so it's not very popular.
The popular dict comprehension approach:
dict_ = {k : v for k, v in dict_ if not condition(k, v)}
for k, v in dict_.items():
# do other things you need to do in this loop
results in a full dictionary copy, and so has the risk of a silly performance hit if dictionaries grow large or the containing function is called often.
A much better approach is to copy the keys only rather than whole dictionary:
for k in list(dict_.keys()):
if condition(k, dict_[k]):
del dict_[k]
continue
# do other things you need to do in this loop
(Note that all code examples are in Python 3, so keys()
, items()
returns a view, not a copy.)
In most cases, it won't hurt performance that much, since the time to check even the simplest condition (not to mention other stuff you're doing in the loop) is usually greater than the time to add one key to a list.
Still, I am wondering if it's possible to avoid even that with a custom dictionary that allows deletions while iterating:
for k, v in dict_.items():
if condition(k, v):
del dict_[k]
continue
# do other things you need to do in this loop
Perhaps an iterator could always look ahead, so that when the __next__
is called, the iterator knows where to go without even looking at the current element (it would only need to look at the element when it first gets to it). And if there is no next element, the iterator could just set the flag that would cause StopIteration
exception raised whenever __next__
is called again.
If the element the iterator tries to advance to turns out to be deleted, it's fine to raise an exception; there is no need to support deletions while multiple iterations are going on simultaneously.
Are there any problems with this approach?
One problem is that I'm not sure it can be done with no material overhead compared to the existing dict
; otherwise, it would be faster to use the list(dict_)
approach!
UPDATE:
I tried all the versions. I don't report the timing, since they are clearly very dependent on the exact situation. But it seems safe to say that in many cases, the fastest approach is likely to be list(dict_)
. After all, if you think about, the copy is the fastest operation that grows linearly with size of the list; almost any other overhead, as long as it's also proportional to the list size, is likely to be bigger.
I really like all the ideas, but since I have to select only one, I'm accepting the context manager solution since it allows to use the dictionary as either normal or "enhanced" with very small code changes.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
正如您所注意到的,您可以将要删除的项目存储在某处,并将其删除推迟到以后。那么问题就变成了何时清除它们以及如何以确保最终调用清除方法。答案是上下文管理器,它也是 dict 的子类。
用法:
如果您不在
with
块中,当然,删除是立即的;由于这是一个dict
子类,因此它的工作方式就像上下文管理器之外的常规dict
一样。您还可以将其实现为字典的包装类:
如果您愿意,甚至可以使包装类完全发挥字典的功能,尽管这会增加相当多的代码。
就性能而言,这无疑不是一个胜利,但从程序员友好的角度来看,我喜欢它。第二种方法应该稍微快一些,因为它不会在每次删除时测试标志。
As you note, you can store the items to delete somewhere and defer the deletion of them until later. The problem then becomes when to purge them and how to make sure that the purge method eventually gets called. The answer to this is a context manager which is also a subclass of
dict
.Usage:
If you're not in a
with
block, of course, deletes are immediate; as this is adict
subclass, it works just like a regulardict
outside of a context manager.You could also implement this as a wrapper class for a dictionary:
It's even possible to make the wrapper class fully functional as a dictionary, if you want, though that's a fair bit more code.
Performance-wise, this is admittedly not such a win, but I like it from a programmer-friendliness standpoint. The second method should be very slightly faster since it's not testing a flag on each delete.
您需要做的是不要修改您迭代的键列表。您可以通过三种方式执行此操作:
在单独的列表中创建键的副本并对其进行迭代。然后,您可以在迭代期间安全地删除字典中的键。这是最简单、最快的,除非字典很大,在这种情况下您应该开始考虑在任何情况下使用数据库。代码:
不要复制您正在迭代的键,而是复制要删除的键。换句话说,不要在迭代时删除这些键,而是将它们添加到列表中,然后在完成迭代后删除该列表中的键。这比 1. 稍微复杂一些,但比 3. 少得多。它也很快。这就是您在第一个示例中所做的事情。
<前><代码>delete_these = []
对于 dict_ 中的 k:
如果条件(k,dict_[k]):
删除_这些.append(k)
继续
# 在这个循环中做你需要做的其他事情
对于delete_these中的k:
删除 dict_[k]
避免创建某种新列表的唯一方法是,正如您所建议的,创建一个特殊的字典。但这要求当您删除键时,它实际上不会删除键,而只是删除键将它们标记为已删除,然后仅在调用清除方法后才真正删除它们。这需要大量的实现,并且存在边缘情况,您会因为忘记清除等而欺骗自己。并且迭代字典必须仍然包含已删除的键,这在某些时候会困扰您。所以我不会推荐这个。 此外,无论您如何在 Python 中实现此功能,您都可能会再次得到要删除的内容列表,因此它可能只是 2 的复杂且容易出错的版本。如果您在 C 中实现它,您可能可以通过将标志直接添加到散列键结构中来摆脱复制。但正如前面提到的,问题确实掩盖了好处。
What you need to do is to not modify the list of keys you iterating over. You can do this in three ways:
Make a copy of the keys in a separate list and iterate over that. You can then safely delete the keys in the dictionary during iteration. This is the easiest, and fastest, unless the dictionary is huge in which case you should start thinking about using a database in any case. Code:
Make a copy not of the keys you are iterating over, but a copy of the keys you are to delete. In other words, don't delete these keys while iterating instead add them to a list, then delete the keys in that list once you are finished iterating. This is slightly more complicated than 1. but much less than 3. It is also fast. This is what you do in your first example.
The only way to avoid making some sort of new list is, as you suggest, to make a special dictionary. But that requires when you delete keys it does not actually delete the keys, but only mark them as deleted, and then delete them for real only once you call a purge method. This requires quite a lot of implementation and there are edge-cases and you'll fudge yourself by forgetting to purge, etc. And iterating over the dictionary must still include the deleted keys, which will bite you at some point. So I wouldn't recommend this. Also, however you implement this in Python, you are likely to just once again end up with a list of things to delete, so it's likely to just be a complicated and error prone version of 2. If you implement it in C, you could probably get away with the copying by adding the flags directly into the hash-key structure. But as mentioned, the problems really overshadow the benefits.
您可以通过迭代字典的键/值对的静态列表(而不是迭代字典视图)来实现此目的。
基本上,迭代
list(dict_.items())
而不是dict_.items()
会起作用:这是一个示例(ideone):
和输出:
You can accomplish this by iterating over a static list of the key/value pairs of the dictionary, instead of iterating over a dictionary view.
Basically, iterating over
list(dict_.items())
instead ofdict_.items()
will work:Here is an example (ideone):
and the output:
Python 3.2 在 stdlib 中有这样的字典:
输出
迭代是通过链接列表执行的,请参阅
__iter__()
方法实现。 删除是安全的(在 Python 3.2 中)项目是弱引用。Python 3.2 has such dict in the stdlib:
Output
Iteration is performed over a linked list, see
__iter__()
method implementation. The deletion is safe (in Python 3.2) even though items are weak references.Python 2.x 和 3.x 的简单实现:
当迭代键、项或值时,它会设置标志
self._iteating
。在 __delitem__ 中,它检查删除项目的能力,并将密钥存储在临时队列中。在迭代结束时,它会删除所有待处理的键。这是非常幼稚的实现,我不建议在生产代码中使用它。
编辑
添加了对 Python 3 的支持以及 @jsbueno 评论的改进。
Python 3 在 Ideone.com 上运行
Naive implementation for Python 2.x and 3.x:
When iterating over keys, items or values it sets flag
self._iterating
. In__delitem__
it checks for ability to delete item, and stores keys in temporary queue. At the end of iterations it deletes all pending keys.It's very naive implementation, and I wouldn't recommend to use it in production code.
EDIT
Added support for Python 3 and improvements from @jsbueno comments.
Python 3 run on Ideone.com
__iter__
和__delitem__
以及其他特殊方法需要协作以在迭代发生时保留要删除的项目列表。当没有当前迭代时,__delitem__ 可以只删除一项,但是当至少发生一次迭代时,它应该只将要删除的键添加到列表中。当最后一个活动迭代完成时,它实际上应该删除一些东西。如果有很多键需要删除,这有点低效,并且如果总是至少进行一次迭代,当然会崩溃。__iter__
and__delitem__
and other special methods need to collaborate to keep a list of items to be removed while an iteration happens. When there are no current iterations,__delitem__
can just delete an item, but when at least one iteration is happening, it should just add the key to be deleted into a list. When the last active iteration finishes, it should actually delete things. This somewhat inefficient if there's a lot of keys to remove, and will, of course, blow up if there's always at least one iteration going on.这可以作为两个示例之间的折衷方案 - 两行比第二行长,但比第一行短且稍快。 Python 2:
分成一个函数,每次调用只剩一行(无论这是否更具可读性是您的调用):
无论代码存储在哪里,您都必须将需要删除的键存储在某个地方。解决这个问题的唯一方法是使用生成器表达式,它会在您第一次删除键时爆炸。
This could work as a compromise between the two examples - two lines longer than the second one, but shorter and slightly faster than the first. Python 2:
Split into a function and it's down to one line each call (whether this is more readable or not is your call):
Regardless of where the code is stored, you'll have to store the keys needing deletion somewhere. The only way around that is using generator expressions, which will explode the moment you delete a key for the first time.
略有不同的方法;有时删除被高估了。迭代时,您可以覆盖字典中的值并将其分配给
None
。这不会“改变”整体结构,它只是将一个元素重新指向None
。这可以在迭代时安全地完成。如果确实必须,您可以随后删除None
(假设您以前从未存储过None
),或者只是让您的代码容忍检索None
就好像钥匙一开始就不存在一样。这里的整个对话实际上围绕着字典的大小和要删除的元素的预期比例。对此持平态度,这些答案中提出的解决方案之一将是适合您的特定用例的“正确的解决方案”。
A slightly different approach; sometimes deletion is overrated. While iterating, you can override the value in the dictionary and assign it to
None
. This does not "change" the overall structure, it just re-points one elements toNone
. This can be done safely while iterating. If you really must, you could delete theNone
s afterwards (assuming you never perviously storedNone
s) or just have your code tolerate retreiveingNone
s as if the key wasn't there in the first place.The whole conversation here really revolves around the size of the dictionary and the expected ratio of the elements you want to delete. Get flat on that, and one of the the solutions presented in these answers will be the "right one" for your specific use-case.