Python - 从列表中删除项目
# I have 3 lists:
L1 = [1, 2, 3, 4, 5, 6, 7, 8, 9]
L2 = [4, 7, 8]
L3 = [5, 2, 9]
# I want to create another that is L1 minus L2's memebers and L3's memebers, so:
L4 = (L1 - L2) - L3 # Of course this isn't going to work
我想知道,执行此操作的“正确”方法是什么。我可以用很多不同的方式来完成它,但是Python的风格指南说每件事应该只有一种正确的方法。我一直不知道这是什么。
# I have 3 lists:
L1 = [1, 2, 3, 4, 5, 6, 7, 8, 9]
L2 = [4, 7, 8]
L3 = [5, 2, 9]
# I want to create another that is L1 minus L2's memebers and L3's memebers, so:
L4 = (L1 - L2) - L3 # Of course this isn't going to work
I'm wondering, what is the "correct" way to do this. I can do it many different ways, but Python's style guide says there should be only 1 correct way of doing each thing. I've never known what this was.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
以下是一些尝试:
现在我已经思考了一会儿,我意识到 L2 + L3 事物创建了一个临时列表,该列表立即被丢弃。因此,更好的方法是:
更新:我看到一些关于性能的夸张说法,我想断言我的解决方案已经尽可能快了。创建中间结果,无论是中间列表还是必须重复调用的中间迭代器,总是比简单地为集合提供
L2
和L3
慢像我在这里所做的那样直接迭代。所有其他替代方案(我能想到的)必然会比这慢。例如,我们自己执行循环,而不是让
set()
构造函数执行它们,会增加费用:使用迭代器,它们涉及的所有状态保存和回调显然会更加昂贵:
所以我相信我昨晚给出的答案仍然是遥远的(显然,“遥远”的值大于大约5微秒)是最好的,除非提问者在
L1
中有重复的答案并希望每次重复项出现在其他列表之一时将它们删除一次。Here are some tries:
Now that I have had a moment to think, I realize that the
L2 + L3
thing creates a temporary list that immediately gets thrown away. So an even better way is:Update: I see some extravagant claims being thrown around about performance, and I want to assert that my solution was already as fast as possible. Creating intermediate results, whether they be intermediate lists or intermediate iterators that then have to be called into repeatedly, will be slower, always, than simply giving
L2
andL3
for the set to iterate over directly like I have done here.All other alternatives (that I can think of) will necessarily be slower than this. Doing the loops ourselves, for example, rather than letting the
set()
constructor do them, adds expense:Using iterators, will all of the state-saving and callbacks they involve, will obviously be even more expensive:
So I believe that the answer I gave last night is still far and away (for values of "far and away" greater than around 5µsec, obviously) the best, unless the questioner will have duplicates in
L1
and wants them removed once each for every time the duplicate appears in one of the other lists.更新::: 帖子包含对 Set 与 freezeset 相比性能较差的虚假指控。我认为在这种情况下使用 freezeset 仍然是明智的,即使不需要对集合本身进行哈希处理,只是因为它在语义上更正确。不过,实际上,我可能不会费心输入额外的 6 个字符。我没有动力去浏览和编辑这篇文章,所以请注意,“指控”链接链接到一些错误运行的测试。血淋淋的细节在评论中详细讨论。 :::update
第二段代码已发布< Brandon Craig Rhodes 的 /a> 非常好,但由于他没有回应我关于使用 freezeset 的建议(好吧,无论如何,当我开始写这篇文章时没有回应),我将继续自己发布它。
手头工作的全部基础是检查一系列值 (
L1
) 中的每个值是否在另一组值中;该组值是L2
和L3
的内容。该句子中使用“set”一词很能说明问题:尽管L2
和L3
是list
,但我们并不真正关心关于它们的类似列表的属性,例如它们的值的顺序或它们包含的每个值的数量。我们只关心它们共同包含的值的集合(又出现了)。如果该组值存储为列表,则必须逐个浏览列表元素,检查每个元素。它相对耗时,并且语义不好:同样,它是一组值,而不是列表。因此,Python 拥有这些简洁的集合类型,它们包含一堆唯一值,并且可以快速告诉您其中是否包含某些值。这与查找键时 Python 的 dict 类型的工作方式几乎相同。
集合和冻结集合之间的区别是集合是可变的,这意味着它们可以在创建后进行修改。关于这两种类型的文档位于此处。
由于我们需要创建的集合,即存储在
L2
和L3
中的值的并集,一旦创建就不会被修改,因此在语义上使用不可变数据是合适的类型。据称这也是 有一些性能优势。嗯,这是有道理的,它会有一些优势;否则,为什么 Python 会有frozenset
作为内置函数?更新...
布兰登已经回答了这个问题:冻结集的真正优点是它们的不变性使得它们可以可哈希,允许它们成为字典键或其他集合的成员。
我运行了一些非正式的计时测试,比较了相对较大(3000 个元素)的冻结集和可变集的创建和查找速度;没有太大区别。这与上面的链接冲突,但支持布兰登所说的关于它们是相同的但在可变性方面。
...更新
现在,因为 freezesets 是不可变的,所以它们没有更新方法。 Brandon 使用
set.update
方法来避免在创建集合的过程中创建然后丢弃临时列表;我将采取不同的方法。这个 生成器表达式 使
items
成为迭代器连续覆盖L2
和L3
的内容。不仅如此,它还不需要创建一个包含中间对象的整个列表。在生成器中使用嵌套的for
表达式有点令人困惑,但我设法通过记住它们的嵌套顺序与编写实际 for 循环时的顺序相同来保持它的排序,例如生成器函数相当于我们分配给
items。嗯,只不过它是参数化函数定义而不是直接赋值给变量。
无论如何,足够的题外话了。生成器的重要之处在于它们实际上不做任何事情。好吧,至少不是马上:他们只是设置稍后要完成的工作,当生成器表达式被迭代时。这正式称为“懒惰”。我们将通过将
items
传递给frozenset
函数来做到这一点(好吧,无论如何,我是这样),该函数会对其进行迭代并返回一个冰冷的冰冻集。实际上,您可以通过将生成器表达式放在对
frozenset
的调用中来合并最后两行:只要迭代器 由生成器表达式创建,是您正在调用的函数的唯一参数。否则,您必须将其写在通常的单独括号中,就像将元组作为参数传递给函数一样。
现在我们可以像 Brandon 一样构建一个新列表,使用 列表理解。它们使用与生成器表达式相同的语法,并且做基本上相同的事情,除了它们是eager而不是lazy(同样,这些是实际的技术术语),所以它们立即开始迭代这些项目并从中创建一个列表。
这相当于将生成器表达式传递给
list
,但更惯用。
因此,这将创建列表
L4
,其中包含L1
中不在L2
或L3
中的元素,保持它们原来的顺序和数量。如果您只想知道哪些值在
L1
中,但不在L2
或L3
中,则更容易:您只需创建该集合:您可以从中创建一个列表,就像 st0le 一样,但这可能不是您真正想要的。如果您确实想要仅在
L1
中找到的集合,您可能有充分的理由将该集合保留为set
,或者实际上是frozenset
:...Annnnd,现在来一些完全不同的东西:< /em>
update::: post contains a reference to false allegations of inferior performance of sets compared to frozensets. I maintain that it's still sensible to use a frozenset in this instance, even though there's no need to hash the set itself, just because it's more correct semantically. Though, in practice, I might not bother typing the extra 6 characters. I'm not feeling motivated to go through and edit the post, so just be advised that the "allegations" link links to some incorrectly run tests. The gory details are hashed out in the comments. :::update
The second chunk of code posted by Brandon Craig Rhodes is quite good, but as he didn't respond to my suggestion about using a frozenset (well, not when I started writing this, anyway), I'm going to go ahead and post it myself.
The whole basis of the undertaking at hand is to check if each of a series of values (
L1
) are in another set of values; that set of values is the contents ofL2
andL3
. The use of the word "set" in that sentence is telling: even thoughL2
andL3
arelist
s, we don't really care about their list-like properties, like the order that their values are in or how many of each they contain. We just care about the set (there it is again) of values they collectively contain.If that set of values is stored as a list, you have to go through the list elements one by one, checking each one. It's relatively time-consuming, and it's bad semantics: again, it's a "set" of values, not a list. So Python has these neat set types that hold a bunch of unique values, and can quickly tell you if some value is in them or not. This works in pretty much the same way that python's
dict
types work when you're looking up a key.The difference between sets and frozensets is that sets are mutable, meaning that they can be modified after creation. Documentation on both types is here.
Since the set we need to create, the union of the values stored in
L2
andL3
, is not going to be modified once created, it's semantically appropriate to use an immutable data type. This also allegedly has some performance benefits. Well, it makes sense that it would have some advantage; otherwise, why would Python havefrozenset
as a builtin?update...
Brandon has answered this question: the real advantage of frozen sets is that their immutability makes it possible for them to be hashable, allowing them to be dictionary keys or members of other sets.
I ran some informal timing tests comparing the speed for creation of and lookup on relatively large (3000-element) frozen and mutable sets; there wasn't much difference. This conflicts with the above link, but supports what Brandon says about them being identical but for the aspect of mutability.
...update
Now, because frozensets are immutable, they don't have an update method. Brandon used the
set.update
method to avoid creating and then discarding a temporary list en route to set creation; I'm going to take a different approach.This generator expression makes
items
an iterator over, consecutively, the contents ofL2
andL3
. Not only that, but it does it without creating a whole list-full of intermediate objects. Using nestedfor
expressions in generators is a bit confusing, but I manage to keep it sorted out by remembering that they nest in the same order that they would if you wrote actual for loops, e.g.That generator function is equivalent to the generator expression that we assigned to
items
. Well, except that it's a parametrized function definition instead of a direct assignment to a variable.Anyway, enough digression. The big deal with generators is that they don't actually do anything. Well, at least not right away: they just set up work to be done later, when the generator expression is iterated. This is formally referred to as being lazy. We're going to do that (well, I am, anyway) by passing
items
to thefrozenset
function, which iterates over it and returns a frosty cold frozenset.You could actually combine the last two lines, by putting the generator expression right inside the call to
frozenset
:This neat syntactical trick works as long as the iterator created by the generator expression is the only parameter to the function you're calling. Otherwise you have to write it in its usual separate set of parentheses, just like you were passing a tuple as an argument to the function.
Now we can build a new list in the same way that Brandon did, with a list comprehension. These use the same syntax as generator expressions, and do basically the same thing, except that they are eager instead of lazy (again, these are actual technical terms), so they get right to work iterating over the items and creating a list from them.
This is equivalent to passing a generator expression to
list
, e.g.but more idiomatic.
So this will create the list
L4
, containing the elements ofL1
which weren't in eitherL2
orL3
, maintaining the order that they were originally in and the number of them that there were.If you just want to know which values are in
L1
but not inL2
orL3
, it's much easier: you just create that set:You can make a list out of it, as does st0le, but that might not really be what you want. If you really do want the set of values that are only found in
L1
, you might have a very good reason to keep that set as aset
, or indeed afrozenset
:...Annnnd, now for something completely different:
假设您的个人列表不包含重复项...使用
Set
和Difference
Assuming your individual lists won't contain duplicates....Use
Set
andDifference
这可能不像列表理解的答案那么Python式,但看起来更简单:
这里的优点是我们保留列表的顺序,并且如果存在重复元素 strong>,每次它出现在 l2 中时,我们只删除一个。
This may be less pythonesque than the list-comprehension answer, but has a simpler look to it:
The advantage here is that we preserve order of the list, and if there are duplicate elements, we remove only one for each time it appears in l2.
在列表中执行此类操作很快就会影响程序的性能。发生的情况是,每次删除时,列表操作都会执行一次新的 malloc & 操作。移动元素。如果您的清单非常庞大或其他原因,这可能会很昂贵。所以我建议这样做 -
我假设你的列表有独特的元素。否则,您需要在字典中维护一个具有重复值的列表。无论如何,对于您提供的数据,这里是 -
方法 1
方法 2
如果这一切看起来代码太多了。然后你可以尝试使用
set
。但这样你的列表就会丢失所有重复的元素。Doing such operations in Lists can hamper your program's performance very soon. What happens is with each remove, List operations do a fresh malloc & move elements around. This can be expensive if you have a very huge list or otherwise. So I would suggest this -
I am assuming your list has unique elements. Otherwise you need to maintain a list in your dict having duplicate values. Anyway for the data your provided, here it is-
METHOD 1
METHOD 2
If all that looks like too much code. Then you could try using
set
. But this way your list will loose all duplicate elements.我认为 intuited 的答案对于这样一个简单的问题来说太长了,而且 Python 已经有一个内置函数来链接两个列表作为生成器。
流程如下:
itertools.chain
在不创建消耗内存的副本x in someset
)的时间复杂度为 O(1),因此速度会非常快。现在是代码:
这应该是最快、最简单且消耗内存最少的解决方案之一。
I think intuited's answer is way too long for such a simple problem, and Python already has a builtin function to chain two lists as a generator.
The procedure is as follows:
itertools.chain
to chain L2 and L3 without creating a memory-consuming copyx in someset
) is O(1), this will be very fast.And now the code:
This should be one of the fastest, simplest and least memory-consuming solution.