python 相当于 filter() 获取两个输出列表(即列表的分区)
假设我有一个列表和一个过滤功能。使用类似
>>> filter(lambda x: x > 10, [1,4,12,7,42])
[12, 42]
I 的东西可以获得符合条件的元素。是否有一个函数可以使用它输出两个列表,一个元素匹配,一个剩余元素?我可以调用 filter()
函数两次,但这有点难看:)
编辑:元素的顺序应该保留,并且我可能多次拥有相同的元素。
Let's say I have a list, and a filtering function. Using something like
>>> filter(lambda x: x > 10, [1,4,12,7,42])
[12, 42]
I can get the elements matching the criterion. Is there a function I could use that would output two lists, one of elements matching, one of the remaining elements? I could call the filter()
function twice, but that's kinda ugly :)
Edit: the order of elements should be conserved, and I may have identical elements multiple times.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(14)
试试这个:
用法:
itertools Recipes:
配方来自Python 3.x文档。在 Python 2.x 中,
filterfalse
称为ifilterfalse
。Try this:
Usage:
There is also an implementation suggestion in itertools recipes:
The recipe comes from the Python 3.x documentation. In Python 2.x
filterfalse
is calledifilterfalse
.上面的代码有一点丑陋但更快的版本:
这是第二次编辑,但我认为这很重要:
第二次和第三次与上面的迭代一样快,但代码更少。
and a little uglier but faster version of the above code:
This is second edit, but I think it matters:
The second and the third are as quick as the iterative one upper but are less code.
TL;DR
已接受、得票最多的答案 [1],作者:马克·拜尔斯
是最简单且最
最快。
对不同方法进行基准测试
可以对所建议的不同方法进行分类
大致分为三个类别,
列表,
lis.append
通过函数方法介导,返回一个 2 元组列表,
itertools
中给出的规范配方文档,返回一个 2 元组(宽松地说)生成器。
下面是这三种技术的普通实现,首先
函数式方法,然后是itertools,最后是两种不同的方法
直接列表操作的实现,替代方案是
使用
False
为零,True
是一种技巧。请注意,这是 Python3 - 因此
reduce
来自functools
-并且该OP请求一个像
(positives, negatives)
这样的元组,但是我的实现全部返回
(负数,正数)
...我们需要一个谓词来应用于我们的列表和列表(再次,松散地
说)进行操作。
为了克服测试
itertools
方法中的问题,那就是由 joeln 报道
2013 年 10 月 31 日 6:17
我想到了一个空循环,它只是实例化所有的情侣
不同分区返回的两个可迭代对象中的元素数量
功能。
首先我们使用两个固定列表来了解
隐含的重载(使用非常方便的 IPython 魔法
%timeit
)接下来我们使用不同的实现,一个接着一个
。注释
最简单的方法也是最快的方法。
使用
x[p(n)]
技巧是,嗯,没用,因为在每一步你必须对数据结构建立索引,这会给你带来轻微的惩罚 - 这是
然而,如果你想说服衰退中的幸存者,那么很高兴知道
python化的文化。
函数式方法,操作上等价于
替代的
append
实现速度慢约 50%,可能是由于事实上,我们有一个额外的(w/r 到谓词评估)函数
调用每个列表元素。
itertools
方法具有以下(通常的)优点: ❶ 没有潜在的大列表被实例化,并且 ❷ 输入列表没有被实例化
如果您打破消费者循环,则完全处理,但是当我们
使用它会比较慢,因为需要在两者上应用谓词
tee
的末端除此之外,
我已经爱上了
object.mutate() 或 object
习语,被 Marii 曝光
他们的答案显示
解决问题的功能性方法——恐怕迟早,
我要滥用它。
脚注
[1] 截至今天,即 2017 年 9 月 14 日,已被接受并获得最多投票 — 但当然,我对我的这个答案抱有最高的希望!
TL;DR
The accepted, most voted answer [1] by Mark Byers
is the simplest and the
fastest.
Benchmarking the different approaches
The different approaches that had been suggested can be classified
broadly in three categories,
lis.append
, returning a 2-tupleof lists,
lis.append
mediated by a functional approach, returning a 2-tupleof lists,
itertools
finedocumentation, returning a 2-tuple of, loosely speaking, generators.
Here follows a vanilla implementation of the three techniques, first
the functional approach, then
itertools
and eventually two differentimplementations of direct list manipulation, the alternative being
using the
False
is zero,True
is one trick.Note that this is Python3 — hence
reduce
comes fromfunctools
—and that OP request a tuple like
(positives, negatives)
but myimplementations all return
(negatives, positives)
…We need a predicate to apply to our lists and lists (again, loosely
speaking) on which to operate.
To overcome the problem in testing the
itertools
approach, that wasreported by joeln on
Oct 31 '13 at 6:17
I have thought of a void loop that just instantiates all the couples
of elements in the two iterables returned by the different partition
functions.
First we use two fixed lists to have an idea of the
overload implied (using the very convenient IPython's magic
%timeit
)Next we use the different implementations, one after the other
Comments
The plainest of the approaches is also the fastest one.
Using the
x[p(n)]
trick is, ehm, useless because at every step youhave to index a data structure, giving you a slight penalty — it's
however nice to know if you want to persuade a survivor of a declining
culture at pythonizing.
The functional approach, that is operatively equivalent to the
alternative
append
implementation, is ~50% slower, possibly due tothe fact that we have an extra (w/r to predicate evaluation) function
call for each list element.
The
itertools
approach has the (customary) advantages that ❶ nopotentially large list is instantiated and ❷ the input list is not
entirely processed if you break out of the consumer loop, but when we
use it it is slower because of the need to apply the predicate on both
ends of the
tee
Aside
I've fallen in love with the
object.mutate() or object
idiom thatwas exposed by Marii
in their answer showing
a functional approach to the problem — I'm afraid that, sooner or later,
I'm going to abuse it.
Footnotes
[1] Accepted and most voted as today, Sep 14 2017 — but of course I have the highest hopes for this answer of mine!
我认为 groupby 在这里可能更相关:
http://docs.python.org/library/itertools.html #itertools.groupby
例如,将列表拆分为奇数和偶数(或者可以是任意数量的组):
I think groupby might be more relevant here:
http://docs.python.org/library/itertools.html#itertools.groupby
For example, splitting a list into odd and even numbers (or could be an arbitrary number of groups):
您可以查看
django.utils。 function.partition
解决方案:在我看来,这是这里介绍的最优雅的解决方案。
这部分没有记录,只能在 https:// 上找到源代码/docs.djangoproject.com/en/dev/_modules/django/utils/function/
You can look at
django.utils.functional.partition
solution:In my opinion it's the most elegant solution presented here.
This part is not documented, only source code can be found on https://docs.djangoproject.com/en/dev/_modules/django/utils/functional/
如果您的列表中没有重复的元素,您绝对可以使用 set:
或者您可以通过可理解的列表来完成:
注意:它不是一个函数,但只需知道第一个 fitler() 结果,您就可以推断出没有的元素很大程度上是你的过滤标准。
If you don't have duplicate element in your list you can definitely use set:
or you can do by a list comprehensible:
N.B: it's not a function but just knowing the first fitler() result you can deduce the element that didn't much your filter criterion .
我正好有这个要求。我不喜欢 itertools 配方,因为它涉及两次单独的数据传递。这是我的实现:
I just had exactly this requirement. I'm not keen on the itertools recipe since it involves two separate passes through the data. Here's my implementation:
现有的答案要么将可迭代对象划分为两个列表,要么低效地将其划分为两个生成器。这是一个将可迭代对象有效地划分为两个生成器的实现,即对于可迭代对象中的每个元素最多调用一次谓词函数。您可能想要使用此版本的一个实例是,如果您需要使用计算成本昂贵的谓词对非常大(甚至无限)的可迭代对象进行分区。
基本上,这会遍历迭代器中的每个项目,检查谓词,如果正在使用相应的生成器,则生成它,或者将其放入另一个可迭代的缓冲区中。此外,每个生成器在检查原始可迭代对象之前都会首先从其缓冲区中提取项目。请注意,每个分区都有自己的缓冲区,每次迭代另一个分区时该缓冲区都会增长,因此此实现可能不适合一个分区比另一个分区迭代次数多得多的用例。
示例用例:
The existing answers either partition an iterable into two lists, or inefficiently partition it into two generators. Here is an implementation that efficiently partitions an iterable into two generators, i.e. the predicate function is called at most once for each element in the iterable. One instance where you might want to use this version is if your need to partition a very large (or even infinite) iterable with an expensive to compute predicate.
Basically, this steps through each item in the the iterator, checks the predicate, and either yields it, if the corresponding generator is being used, or puts it in the buffer for the other iterable. Additionally, each generator will first pull items from its buffer before checking the original iterable. Note that each partition has it's own buffer which grows each time the other partition is iterated, so this implementation may not be suitable for use cases where one partition is iterated much more than the other.
example use case:
每个人似乎都认为他们的解决方案是最好的,所以我决定使用 timeit 来测试所有这些解决方案。我使用“def is_odd(x): return x & 1”作为我的谓词函数,使用“xrange(1000)”作为可迭代函数。这是我的 Python 版本:
这是我的测试结果:
这些都是相互比较的。现在,让我们尝试使用 Python 文档中给出的示例。
这似乎有点快了。
itertools 示例代码以至少 100 倍的优势击败了所有竞争对手!其寓意是,不要不断地重新发明轮子。
Everyone seems to think that their solution is the best, so I decided to use timeit to test all of them. I used "def is_odd(x): return x & 1" as my predicate function, and "xrange(1000)" as the iterable. Here is my version of Python:
And here are the results of my testing:
Those are all comparable to each other. Now, let's try using the example given in the Python documentation.
This seems to be a bit faster.
The itertools example code beats all comers by a factor of at least 100! The moral is, don't keep re-inventing the wheel.
已经有很多好的答案了。我喜欢用这个:
Plenty of good answers already. I like to use this:
用于附加到目标列表的简洁代码
Concise code for appending to target list
一个等效问题的投票最高的三个答案建议使用itertools.tee()(如这里已经介绍的)以及两种更简单的方法。
The three top voted answers to an equivalent question propose to use
itertools.tee()
(as already covered here) and two even simpler approaches as wells.collections.defaultdict
方法是排序操作的优秀助手。此方法也适用于由
filter_key
函数生成的任意数量的类别。The
collections.defaultdict
method is an excellent helper for sorting operations.This approach will also work for any number of categories generated by the
filter_key
function.