过滤列表

发布于 2024-08-08 22:59:48 字数 174 浏览 5 评论 0原文

我想过滤列表中重复的元素 例如

foo = ['a','b','c','a','b','d','a','d']

我只感兴趣:

['a','b','c','d']

实现这一目标的有效方法是什么? 干杯

I want to filter repeated elements in my list
for instance

foo = ['a','b','c','a','b','d','a','d']

I am only interested with:

['a','b','c','d']

What would be the efficient way to do achieve this ?
Cheers

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

还在原地等你 2024-08-15 22:59:48

列表(设置 (foo)) 如果您使用的是 Python 2.5 或更高版本,但这不会维持顺序。

list(set(foo)) if you are using Python 2.5 or greater, but that doesn't maintain order.

伪心 2024-08-15 22:59:48

如果您不关心元素顺序,请将 foo 转换为 set

Cast foo to a set, if you don't care about element order.

雨轻弹 2024-08-15 22:59:48

由于列表理解没有保留顺序的答案,因此我建议如下:

>>> temp = set()
>>> [c for c in foo if c not in temp and (temp.add(c) or True)]
['a', 'b', 'c', 'd']

也可以写为

>>> temp = set()
>>> filter(lambda c: c not in temp and (temp.add(c) or True), foo)
['a', 'b', 'c', 'd']

取决于 foo 中有多少元素,通过重复哈希,您可能会获得更快的结果查找而不是通过临时列表重复迭代搜索。

c not in temp 验证 temp 没有项目 c;当项目添加到集合中时,或 True 部分强制将 c 发送到输出列表。

Since there isn't an order-preserving answer with a list comprehension, I propose the following:

>>> temp = set()
>>> [c for c in foo if c not in temp and (temp.add(c) or True)]
['a', 'b', 'c', 'd']

which could also be written as

>>> temp = set()
>>> filter(lambda c: c not in temp and (temp.add(c) or True), foo)
['a', 'b', 'c', 'd']

Depending on how many elements are in foo, you might have faster results through repeated hash lookups instead of repeated iterative searches through a temporary list.

c not in temp verifies that temp does not have an item c; and the or True part forces c to be emitted to the output list when the item is added to the set.

强辩 2024-08-15 22:59:48
>>> bar = []
>>> for i in foo:
    if i not in bar:
        bar.append(i)

>>> bar
['a', 'b', 'c', 'd']

这将是从列表中删除重复项并尽可能保留顺序的最直接方法(尽管这里的“顺序”本质上是错误的概念)。

>>> bar = []
>>> for i in foo:
    if i not in bar:
        bar.append(i)

>>> bar
['a', 'b', 'c', 'd']

this would be the most straightforward way of removing duplicates from the list and preserving the order as much as possible (even though "order" here is inherently wrong concept).

微凉徒眸意 2024-08-15 22:59:48

如果您关心订购的可读方式如下,

def filter_unique(a_list):
    characters = set()
    result = []
    for c in a_list:
        if not c in characters:
            characters.add(c)
            result.append(c)
    return result

根据您对速度、可维护性、空间消耗的要求,您可能会发现上述方式不合适。在这种情况下,请指定您的要求,我们可以尽力做得更好:-)

If you care about order a readable way is the following

def filter_unique(a_list):
    characters = set()
    result = []
    for c in a_list:
        if not c in characters:
            characters.add(c)
            result.append(c)
    return result

Depending on your requirements of speed, maintanability, space consumption, you could find the above unfitting. In that case, specify your requirements and we can try to do better :-)

述情 2024-08-15 22:59:48

如果你编写一个函数来执行此操作,我会使用生成器,它只是想在这种情况下使用。

def unique(iterable):
    yielded = set()
    for item in iterable:
        if item not in yielded:
            yield item
            yielded.add(item)

If you write a function to do this i would use a generator, it just wants to be used in this case.

def unique(iterable):
    yielded = set()
    for item in iterable:
        if item not in yielded:
            yield item
            yielded.add(item)
回眸一遍 2024-08-15 22:59:48

受到 Francesco 的回答的启发,而不是制作我们自己的 filter()-type 函数,让内置函数为我们做一些工作:

def unique(a, s=set()):
    if a not in s:
        s.add(a)
        return True
    return False

用法:

uniq = filter(unique, orig)

这可能会或可能不会比在纯 Python 中实现所有工作的答案执行得更快或更慢。基准测试并查看。当然,这只有效一次,但它演示了这个概念。当然,理想的解决方案是使用类:

class Unique(set):
    def __call__(self, a):
        if a not in self:
            self.add(a)
            return True
        return False

现在我们可以随心所欲地使用它:

uniq = filter(Unique(), orig)

我们可能(也可能不会)再次将性能抛到了九霄云外——使用内置函数的好处可以通过类的开销来抵消。我只是觉得这是一个有趣的想法。

Inspired by Francesco's answer, rather than making our own filter()-type function, let's make the builtin do some work for us:

def unique(a, s=set()):
    if a not in s:
        s.add(a)
        return True
    return False

Usage:

uniq = filter(unique, orig)

This may or may not perform faster or slower than an answer that implements all of the work in pure Python. Benchmark and see. Of course, this only works once, but it demonstrates the concept. The ideal solution is, of course, to use a class:

class Unique(set):
    def __call__(self, a):
        if a not in self:
            self.add(a)
            return True
        return False

Now we can use it as much as we want:

uniq = filter(Unique(), orig)

Once again, we may (or may not) have thrown performance out the window - the gains of using a built-in function may be offset by the overhead of a class. I just though it was an interesting idea.

心如荒岛 2024-08-15 22:59:48

如果您最后需要一个排序列表,这就是您想要的:

>>> foo = ['a','b','c','a','b','d','a','d']
>>> bar = sorted(set(foo))
>>> bar
['a', 'b', 'c', 'd']

This is what you want if you need a sorted list at the end:

>>> foo = ['a','b','c','a','b','d','a','d']
>>> bar = sorted(set(foo))
>>> bar
['a', 'b', 'c', 'd']
旧话新听 2024-08-15 22:59:48
import numpy as np
np.unique(foo)
import numpy as np
np.unique(foo)
趁年轻赶紧闹 2024-08-15 22:59:48

你可以做一些丑陋的列表理解黑客。

[l[i] for i in range(len(l)) if l.index(l[i]) == i]

You could do a sort of ugly list comprehension hack.

[l[i] for i in range(len(l)) if l.index(l[i]) == i]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文