Python - 从列表中删除项目

发布于 2024-09-28 05:10:04 字数 330 浏览 0 评论 0原文

# I have 3 lists:
L1 = [1, 2, 3, 4, 5, 6, 7, 8, 9]
L2 = [4, 7, 8]
L3 = [5, 2, 9]
# I want to create another that is L1 minus L2's memebers and L3's memebers, so:
L4 = (L1 - L2) - L3  # Of course this isn't going to work

我想知道，执行此操作的“正确”方法是什么。我可以用很多不同的方式来完成它，但是Python的风格指南说每件事应该只有一种正确的方法。我一直不知道这是什么。

原文

# I have 3 lists:
L1 = [1, 2, 3, 4, 5, 6, 7, 8, 9]
L2 = [4, 7, 8]
L3 = [5, 2, 9]
# I want to create another that is L1 minus L2's memebers and L3's memebers, so:
L4 = (L1 - L2) - L3  # Of course this isn't going to work

I'm wondering, what is the "correct" way to do this. I can do it many different ways, but Python's style guide says there should be only 1 correct way of doing each thing. I've never known what this was.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

柏林苍穹下 2024-10-05 05:10:04

以下是一些尝试：

L4 = [ n for n in L1 if (n not in L2) and (n not in L3) ]  # parens for clarity

tmpset = set( L2 + L3 )
L4 = [ n for n in L1 if n not in tmpset ]

现在我已经思考了一会儿，我意识到 L2 + L3 事物创建了一个临时列表，该列表立即被丢弃。因此，更好的方法是：

tmpset = set(L2)
tmpset.update(L3)
L4 = [ n for n in L1 if n not in tmpset ]

更新：我看到一些关于性能的夸张说法，我想断言我的解决方案已经尽可能快了。创建中间结果，无论是中间列表还是必须重复调用的中间迭代器，总是比简单地为集合提供 L2 和 L3 慢像我在这里所做的那样直接迭代。

$ python -m timeit \
  -s 'L1=range(300);L2=range(30,70,2);L3=range(120,220,2)' \
  'ts = set(L2); ts.update(L3); L4 = [ n for n in L1 if n not in ts ]'
10000 loops, best of 3: 39.7 usec per loop

所有其他替代方案（我能想到的）必然会比这慢。例如，我们自己执行循环，而不是让 set() 构造函数执行它们，会增加费用：

$ python -m timeit \
  -s 'L1=range(300);L2=range(30,70,2);L3=range(120,220,2)' \
  'unwanted = frozenset(item for lst in (L2, L3) for item in lst); L4 = [ n for n in L1 if n not in unwanted ]'
10000 loops, best of 3: 46.4 usec per loop

使用迭代器，它们涉及的所有状态保存和回调显然会更加昂贵：

$ python -m timeit \
  -s 'L1=range(300);L2=range(30,70,2);L3=range(120,220,2);from itertools import ifilterfalse, chain' \
  'L4 = list(ifilterfalse(frozenset(chain(L2, L3)).__contains__, L1))' 
10000 loops, best of 3: 47.1 usec per loop

所以我相信我昨晚给出的答案仍然是遥远的（显然，“遥远”的值大于大约5微秒）是最好的，除非提问者在L1中有重复的答案并希望每次重复项出现在其他列表之一时将它们删除一次。

Here are some tries:

L4 = [ n for n in L1 if (n not in L2) and (n not in L3) ]  # parens for clarity

tmpset = set( L2 + L3 )
L4 = [ n for n in L1 if n not in tmpset ]

Now that I have had a moment to think, I realize that the L2 + L3 thing creates a temporary list that immediately gets thrown away. So an even better way is:

tmpset = set(L2)
tmpset.update(L3)
L4 = [ n for n in L1 if n not in tmpset ]

Update: I see some extravagant claims being thrown around about performance, and I want to assert that my solution was already as fast as possible. Creating intermediate results, whether they be intermediate lists or intermediate iterators that then have to be called into repeatedly, will be slower, always, than simply giving L2 and L3 for the set to iterate over directly like I have done here.

$ python -m timeit \
  -s 'L1=range(300);L2=range(30,70,2);L3=range(120,220,2)' \
  'ts = set(L2); ts.update(L3); L4 = [ n for n in L1 if n not in ts ]'
10000 loops, best of 3: 39.7 usec per loop

All other alternatives (that I can think of) will necessarily be slower than this. Doing the loops ourselves, for example, rather than letting the set() constructor do them, adds expense:

$ python -m timeit \
  -s 'L1=range(300);L2=range(30,70,2);L3=range(120,220,2)' \
  'unwanted = frozenset(item for lst in (L2, L3) for item in lst); L4 = [ n for n in L1 if n not in unwanted ]'
10000 loops, best of 3: 46.4 usec per loop

Using iterators, will all of the state-saving and callbacks they involve, will obviously be even more expensive:

$ python -m timeit \
  -s 'L1=range(300);L2=range(30,70,2);L3=range(120,220,2);from itertools import ifilterfalse, chain' \
  'L4 = list(ifilterfalse(frozenset(chain(L2, L3)).__contains__, L1))' 
10000 loops, best of 3: 47.1 usec per loop

So I believe that the answer I gave last night is still far and away (for values of "far and away" greater than around 5µsec, obviously) the best, unless the questioner will have duplicates in L1 and wants them removed once each for every time the duplicate appears in one of the other lists.

回复收藏 0 原文

糖果控 2024-10-05 05:10:04

更新::: 帖子包含对 Set 与 freezeset 相比性能较差的虚假指控。我认为在这种情况下使用 freezeset 仍然是明智的，即使不需要对集合本身进行哈希处理，只是因为它在语义上更正确。不过，实际上，我可能不会费心输入额外的 6 个字符。我没有动力去浏览和编辑这篇文章，所以请注意，“指控”链接链接到一些错误运行的测试。血淋淋的细节在评论中详细讨论。 :::update

第二段代码已发布< Brandon Craig Rhodes 的 /a> 非常好，但由于他没有回应我关于使用 freezeset 的建议（好吧，无论如何，当我开始写这篇文章时没有回应），我将继续自己发布它。

手头工作的全部基础是检查一系列值 (L1) 中的每个值是否在另一组值中；该组值是L2 和L3 的内容。该句子中使用“set”一词很能说明问题：尽管 L2 和 L3 是 list，但我们并不真正关心关于它们的类似列表的属性，例如它们的值的顺序或它们包含的每个值的数量。我们只关心它们共同包含的值的集合（又出现了）。

如果该组值存储为列表，则必须逐个浏览列表元素，检查每个元素。它相对耗时，并且语义不好：同样，它是一组值，而不是列表。因此，Python 拥有这些简洁的集合类型，它们包含一堆唯一值，并且可以快速告诉您其中是否包含某些值。这与查找键时 Python 的 dict 类型的工作方式几乎相同。

集合和冻结集合之间的区别是集合是可变的，这意味着它们可以在创建后进行修改。关于这两种类型的文档位于此处。

由于我们需要创建的集合，即存储在 L2 和 L3 中的值的并集，一旦创建就不会被修改，因此在语义上使用不可变数据是合适的类型。据称这也是有一些性能优势。嗯，这是有道理的，它会有一些优势；否则，为什么 Python 会有 frozenset 作为内置函数？

更新...

布兰登已经回答了这个问题：冻结集的真正优点是它们的不变性使得它们可以可哈希，允许它们成为字典键或其他集合的成员。

我运行了一些非正式的计时测试，比较了相对较大（3000 个元素）的冻结集和可变集的创建和查找速度；没有太大区别。这与上面的链接冲突，但支持布兰登所说的关于它们是相同的但在可变性方面。

...更新

现在，因为 freezesets 是不可变的，所以它们没有更新方法。 Brandon 使用 set.update 方法来避免在创建集合的过程中创建然后丢弃临时列表；我将采取不同的方法。

items = (item for lst in (L2, L3) for item in lst)

这个生成器表达式使 items 成为迭代器连续覆盖 L2 和 L3 的内容。不仅如此，它还不需要创建一个包含中间对象的整个列表。在生成器中使用嵌套的 for 表达式有点令人困惑，但我设法通过记住它们的嵌套顺序与编写实际 for 循环时的顺序相同来保持它的排序，例如

def get_items(lists):
    for lst in lists:
        for item in lst:
            yield item

生成器函数相当于我们分配给items。嗯，只不过它是参数化函数定义而不是直接赋值给变量。

无论如何，足够的题外话了。生成器的重要之处在于它们实际上不做任何事情。好吧，至少不是马上：他们只是设置稍后要完成的工作，当生成器表达式被迭代时。这正式称为“懒惰”。我们将通过将 items 传递给 frozenset 函数来做到这一点（好吧，无论如何，我是这样），该函数会对其进行迭代并返回一个冰冷的冰冻集。

unwanted = frozenset(items)

实际上，您可以通过将生成器表达式放在对frozenset的调用中来合并最后两行：

unwanted = frozenset(item for lst in (L2, L3) for item in lst)

只要迭代器由生成器表达式创建，是您正在调用的函数的唯一参数。否则，您必须将其写在通常的单独括号中，就像将元组作为参数传递给函数一样。

现在我们可以像 Brandon 一样构建一个新列表，使用列表理解。它们使用与生成器表达式相同的语法，并且做基本上相同的事情，除了它们是eager而不是lazy（同样，这些是实际的技术术语），所以它们立即开始迭代这些项目并从中创建一个列表。

L4 = [item for item in L1 if item not in unwanted]

这相当于将生成器表达式传递给list，

L4 = list(item for item in L1 if item not in unwanted)

但更惯用。

因此，这将创建列表 L4，其中包含 L1 中不在 L2 或 L3 中的元素，保持它们原来的顺序和数量。

如果您只想知道哪些值在L1中，但不在L2或L3中，则更容易：您只需创建该集合：

L1_unique_values = set(L1) - unwanted

您可以从中创建一个列表，就像 st0le 一样，但这可能不是您真正想要的。如果您确实想要仅在 L1 中找到的集合，您可能有充分的理由将该集合保留为set，或者实际上是frozenset：

L1_unique_values = frozenset(L1) - unwanted

...Annnnd，现在来一些完全不同的东西：< /em>

from itertools import ifilterfalse, chain
L4 = list(ifilterfalse(frozenset(chain(L2, L3)).__contains__, L1))

update::: post contains a reference to false allegations of inferior performance of sets compared to frozensets. I maintain that it's still sensible to use a frozenset in this instance, even though there's no need to hash the set itself, just because it's more correct semantically. Though, in practice, I might not bother typing the extra 6 characters. I'm not feeling motivated to go through and edit the post, so just be advised that the "allegations" link links to some incorrectly run tests. The gory details are hashed out in the comments. :::update

The second chunk of code posted by Brandon Craig Rhodes is quite good, but as he didn't respond to my suggestion about using a frozenset (well, not when I started writing this, anyway), I'm going to go ahead and post it myself.

The whole basis of the undertaking at hand is to check if each of a series of values (L1) are in another set of values; that set of values is the contents of L2 and L3. The use of the word "set" in that sentence is telling: even though L2 and L3 are lists, we don't really care about their list-like properties, like the order that their values are in or how many of each they contain. We just care about the set (there it is again) of values they collectively contain.

If that set of values is stored as a list, you have to go through the list elements one by one, checking each one. It's relatively time-consuming, and it's bad semantics: again, it's a "set" of values, not a list. So Python has these neat set types that hold a bunch of unique values, and can quickly tell you if some value is in them or not. This works in pretty much the same way that python's dict types work when you're looking up a key.

The difference between sets and frozensets is that sets are mutable, meaning that they can be modified after creation. Documentation on both types is here.

Since the set we need to create, the union of the values stored in L2 and L3, is not going to be modified once created, it's semantically appropriate to use an immutable data type. This also allegedly has some performance benefits. Well, it makes sense that it would have some advantage; otherwise, why would Python have frozenset as a builtin?

update...

Brandon has answered this question: the real advantage of frozen sets is that their immutability makes it possible for them to be hashable, allowing them to be dictionary keys or members of other sets.

I ran some informal timing tests comparing the speed for creation of and lookup on relatively large (3000-element) frozen and mutable sets; there wasn't much difference. This conflicts with the above link, but supports what Brandon says about them being identical but for the aspect of mutability.

...update

Now, because frozensets are immutable, they don't have an update method. Brandon used the set.update method to avoid creating and then discarding a temporary list en route to set creation; I'm going to take a different approach.

items = (item for lst in (L2, L3) for item in lst)

This generator expression makes items an iterator over, consecutively, the contents of L2 and L3. Not only that, but it does it without creating a whole list-full of intermediate objects. Using nested for expressions in generators is a bit confusing, but I manage to keep it sorted out by remembering that they nest in the same order that they would if you wrote actual for loops, e.g.

def get_items(lists):
    for lst in lists:
        for item in lst:
            yield item

That generator function is equivalent to the generator expression that we assigned to items. Well, except that it's a parametrized function definition instead of a direct assignment to a variable.

Anyway, enough digression. The big deal with generators is that they don't actually do anything. Well, at least not right away: they just set up work to be done later, when the generator expression is iterated. This is formally referred to as being lazy. We're going to do that (well, I am, anyway) by passing items to the frozenset function, which iterates over it and returns a frosty cold frozenset.

unwanted = frozenset(items)

You could actually combine the last two lines, by putting the generator expression right inside the call to frozenset:

unwanted = frozenset(item for lst in (L2, L3) for item in lst)

This neat syntactical trick works as long as the iterator created by the generator expression is the only parameter to the function you're calling. Otherwise you have to write it in its usual separate set of parentheses, just like you were passing a tuple as an argument to the function.

Now we can build a new list in the same way that Brandon did, with a list comprehension. These use the same syntax as generator expressions, and do basically the same thing, except that they are eager instead of lazy (again, these are actual technical terms), so they get right to work iterating over the items and creating a list from them.

L4 = [item for item in L1 if item not in unwanted]

This is equivalent to passing a generator expression to list, e.g.

L4 = list(item for item in L1 if item not in unwanted)

but more idiomatic.

So this will create the list L4, containing the elements of L1 which weren't in either L2 or L3, maintaining the order that they were originally in and the number of them that there were.

If you just want to know which values are in L1 but not in L2 or L3, it's much easier: you just create that set:

L1_unique_values = set(L1) - unwanted

You can make a list out of it, as does st0le, but that might not really be what you want. If you really do want the set of values that are only found in L1, you might have a very good reason to keep that set as a set, or indeed a frozenset:

L1_unique_values = frozenset(L1) - unwanted

...Annnnd, now for something completely different:

from itertools import ifilterfalse, chain
L4 = list(ifilterfalse(frozenset(chain(L2, L3)).__contains__, L1))

回复收藏 0 原文

我很坚强 2024-10-05 05:10:04

假设您的个人列表不包含重复项...使用 Set 和 Difference

L1 = [1, 2, 3, 4, 5, 6, 7, 8, 9]
L2 = [4, 7, 8]
L3 = [5, 2, 9]
print(list(set(L1) - set(L2) - set(L3)))

Assuming your individual lists won't contain duplicates....Use Set and Difference

L1 = [1, 2, 3, 4, 5, 6, 7, 8, 9]
L2 = [4, 7, 8]
L3 = [5, 2, 9]
print(list(set(L1) - set(L2) - set(L3)))

回复收藏 0 原文

睫毛上残留的泪 2024-10-05 05:10:04

这可能不像列表理解的答案那么Python式，但看起来更简单：

l1 = [ ... ]
l2 = [ ... ]

diff = list(l1) # this copies the list
for element in l2:
    diff.remove(element)

这里的优点是我们保留列表的顺序，并且如果存在重复元素 strong>，每次它出现在 l2 中时，我们只删除一个。

This may be less pythonesque than the list-comprehension answer, but has a simpler look to it:

l1 = [ ... ]
l2 = [ ... ]

diff = list(l1) # this copies the list
for element in l2:
    diff.remove(element)

The advantage here is that we preserve order of the list, and if there are duplicate elements, we remove only one for each time it appears in l2.

回复收藏 0 原文

白馒头 2024-10-05 05:10:04

在列表中执行此类操作很快就会影响程序的性能。发生的情况是，每次删除时，列表操作都会执行一次新的 malloc & 操作。移动元素。如果您的清单非常庞大或其他原因，这可能会很昂贵。所以我建议这样做 -

我假设你的列表有独特的元素。否则，您需要在字典中维护一个具有重复值的列表。无论如何，对于您提供的数据，这里是 -

方法 1

d = dict()
for x in L1: d[x] = True

# Check if L2 data is in 'd'
for x in L2:
    if x in d:
        d[x] = False

for x in L3:
    if x in d:
        d[x] = False

# Finally retrieve all keys with value as True.
final_list = [x for x in d if d[x]]

方法 2
如果这一切看起来代码太多了。然后你可以尝试使用set。但这样你的列表就会丢失所有重复的元素。

final_set  = set.difference(set(L1),set(L2),set(L3))
final_list = list(final_set)

Doing such operations in Lists can hamper your program's performance very soon. What happens is with each remove, List operations do a fresh malloc & move elements around. This can be expensive if you have a very huge list or otherwise. So I would suggest this -

I am assuming your list has unique elements. Otherwise you need to maintain a list in your dict having duplicate values. Anyway for the data your provided, here it is-

METHOD 1

d = dict()
for x in L1: d[x] = True

# Check if L2 data is in 'd'
for x in L2:
    if x in d:
        d[x] = False

for x in L3:
    if x in d:
        d[x] = False

# Finally retrieve all keys with value as True.
final_list = [x for x in d if d[x]]

METHOD 2
If all that looks like too much code. Then you could try using set. But this way your list will loose all duplicate elements.

final_set  = set.difference(set(L1),set(L2),set(L3))
final_list = list(final_set)

回复收藏 0 原文

如若梦似彩虹 2024-10-05 05:10:04

我认为 intuited 的答案对于这样一个简单的问题来说太长了，而且 Python 已经有一个内置函数来链接两个列表作为生成器。

流程如下：

使用 itertools.chain 在不创建消耗内存的副本
的情况下链接 L2 和 L3 从中创建一个集合（在这种情况下，可以使用冻结集，因为我们在创建后不会更改它）
使用列表理解来过滤掉 L1 和 L3 中的元素也在 L2 或 L3 中。由于 set/frozenset 查找（x in someset）的时间复杂度为 O(1)，因此速度会非常快。

现在是代码：

L1 = [1, 2, 3, 4, 5, 6, 7, 8, 9]
L2 = [4, 7, 8]
L3 = [5, 2, 9]

from itertools import chain
tmp = frozenset(chain(L2, L3))
L4 = [x for x in L1 if x not in tmp] # [1, 3, 6]

这应该是最快、最简单且消耗内存最少的解决方案之一。

I think intuited's answer is way too long for such a simple problem, and Python already has a builtin function to chain two lists as a generator.

The procedure is as follows:

Use itertools.chain to chain L2 and L3 without creating a memory-consuming copy
Create a set from that (in this case, a frozenset will do because we don't change it after creation)
Use list comprehension to filter out elements that are in L1 and also in L2 or L3. As set/frozenset lookup (x in someset) is O(1), this will be very fast.

And now the code:

L1 = [1, 2, 3, 4, 5, 6, 7, 8, 9]
L2 = [4, 7, 8]
L3 = [5, 2, 9]

from itertools import chain
tmp = frozenset(chain(L2, L3))
L4 = [x for x in L1 if x not in tmp] # [1, 3, 6]

This should be one of the fastest, simplest and least memory-consuming solution.

回复收藏 0 原文

~没有更多了~