使用唯一条目获取两个列表之间的差异
我在Python中有两个列表:
temp1 = ['One', 'Two', 'Three', 'Four']
temp2 = ['One', 'Two']
假设每个列表中的元素都是唯一的,我想创建第三个列表,其中包含第一个列表中不在第二个列表中的项目:
temp3 = ['Three', 'Four']
是否有任何无需循环和检查的快速方法?
I have two lists in Python:
temp1 = ['One', 'Two', 'Three', 'Four']
temp2 = ['One', 'Two']
Assuming the elements in each list are unique, I want to create a third list with items from the first list which are not in the second list:
temp3 = ['Three', 'Four']
Are there any fast ways without cycles and checking?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(30)
要获取
temp1
中但不在temp2
中的元素(假设每个列表中的元素是唯一的):请注意它是不对称的:
其中您可能期望/希望它等于
set([1, 3])
。如果您确实想要set([1, 3])
作为答案,可以使用set([1, 2]).symmetry_difference(set([2, 3]))< /代码>。
To get elements which are in
temp1
but not intemp2
(assuming uniqueness of the elements in each list):Beware that it is asymmetric :
where you might expect/want it to equal
set([1, 3])
. If you do wantset([1, 3])
as your answer, you can useset([1, 2]).symmetric_difference(set([2, 3]))
.现有的解决方案都提供以下其中之一:
但到目前为止还没有解决方案两者兼而有之。如果您两者都想要,请尝试以下操作:
性能测试
结果:
我提出的方法以及保留顺序也比集合减法更快(稍微),因为它不需要构造不必要的集合。如果第一个列表比第二个列表长得多并且散列成本很高,则性能差异会更加明显。这是证明这一点的第二个测试:
结果:
The existing solutions all offer either one or the other of:
But so far no solution has both. If you want both, try this:
Performance test
Results:
The method I presented as well as preserving order is also (slightly) faster than the set subtraction because it doesn't require construction of an unnecessary set. The performance difference would be more noticable if the first list is considerably longer than the second and if hashing is expensive. Here's a second test demonstrating this:
Results:
可以使用 python XOR 运算符来完成。
Can be done using python XOR operator.
您可以使用列表理解:
You could use list comprehension:
试试这个:
Try this:
如果你想要递归地区别,我已经为 python 编写了一个包:
https://github.com/seperman/deepdiff
安装
从 PyPi 安装:
示例用法
导入
相同对象返回空
项目类型有已更改
项目的值已更改
添加和/或删除的项目
字符串差异
字符串差异 2
类型更改
列表差异
列表差异 2:
忽略顺序或重复的列表差异:(使用与上面相同的字典)
包含字典的列表:
集合:
命名元组:
自定义对象:
添加对象属性:
In case you want the difference recursively, I have written a package for python:
https://github.com/seperman/deepdiff
Installation
Install from PyPi:
Example usage
Importing
Same object returns empty
Type of an item has changed
Value of an item has changed
Item added and/or removed
String difference
String difference 2
Type change
List difference
List difference 2:
List difference ignoring order or duplicates: (with the same dictionaries as above)
List that contains dictionary:
Sets:
Named Tuples:
Custom objects:
Object attribute added:
可以使用以下简单函数找到两个列表(例如 list1 和 list2)之间的差异。
或者
通过使用上述函数,可以使用
diff(temp2, temp1)
或diff(temp1, temp2)
找到差异。两者都会给出结果['Four', 'Three']
。您不必担心列表的顺序或首先给出哪个列表。Python 文档参考
The difference between two lists (say list1 and list2) can be found using the following simple function.
or
By Using the above function, the difference can be found using
diff(temp2, temp1)
ordiff(temp1, temp2)
. Both will give the result['Four', 'Three']
. You don't have to worry about the order of the list or which list is to be given first.Python doc reference
最简单的方法,
使用 set().difference(set())
答案是
set([1])
可以打印为列表,
most simple way,
use set().difference(set())
answer is
set([1])
can print as a list,
我会扔进去,因为目前的解决方案都不会产生元组:
或者:
像这个方向上的其他非元组产生答案一样,它保留顺序
i'll toss in since none of the present solutions yield a tuple:
alternatively:
Like the other non-tuple yielding answers in this direction, it preserves order
如果您真正关注性能,那么请使用 numpy!
这是 github 上的完整笔记本,其中包含 list、numpy 和 pandas 之间的比较。
https://gist.github.com/denfromufa/2821ff59b02e9482be15d27f2bbd4451
If you are really looking into performance, then use numpy!
Here is the full notebook as a gist on github with comparison between list, numpy, and pandas.
https://gist.github.com/denfromufa/2821ff59b02e9482be15d27f2bbd4451
我想要一些需要两个列表并且可以执行
bash
中的diff
功能的东西。由于当您搜索“python diff两个列表”时首先出现这个问题并且不是很具体,因此我将发布我的想法。使用
中的
您可以像SequenceMather
difflibdiff
一样比较两个列表。其他答案都不会告诉您差异发生的位置,但这个答案会。有些答案只给出了一个方向的差异。有些对元素重新排序。有些不处理重复项。但是这个解决方案为您提供了两个列表之间的真正区别:这个输出:
当然,如果您的应用程序做出与其他答案相同的假设,您将从中受益最多。但如果您正在寻找真正的
diff
功能,那么这是唯一的方法。例如,其他答案都无法处理:
但这个答案可以:
I wanted something that would take two lists and could do what
diff
inbash
does. Since this question pops up first when you search for "python diff two lists" and is not very specific, I will post what I came up with.Using
SequenceMather
fromdifflib
you can compare two lists likediff
does. None of the other answers will tell you the position where the difference occurs, but this one does. Some answers give the difference in only one direction. Some reorder the elements. Some don't handle duplicates. But this solution gives you a true difference between two lists:This outputs:
Of course, if your application makes the same assumptions the other answers make, you will benefit from them the most. But if you are looking for a true
diff
functionality, then this is the only way to go.For example, none of the other answers could handle:
But this one does:
这是最简单情况的
Counter
答案。这比上面进行双向比较的方法要短,因为它只执行问题所要求的操作:生成第一个列表中内容的列表,而不是第二个列表中的内容。
或者,根据您的可读性偏好,它会产生一个不错的单行:
输出:
请注意,如果您只是迭代它,则可以删除
list(...)
调用。因为该解决方案使用计数器,所以与许多基于集合的答案相比,它可以正确处理数量。例如,在此输入上:
输出为:
Here's a
Counter
answer for the simplest case.This is shorter than the one above that does two-way diffs because it only does exactly what the question asks: generate a list of what's in the first list but not the second.
Alternatively, depending on your readability preferences, it makes for a decent one-liner:
Output:
Note that you can remove the
list(...)
call if you are just iterating over it.Because this solution uses counters, it handles quantities properly vs the many set-based answers. For example on this input:
The output is:
这可能比 Mark 的列表理解更快:
this could be even faster than Mark's list comprehension:
这是@SuperNova 答案的修改版本
Here is a modified version of @SuperNova's answer
arulmr 解决方案的单行版本
single line version of arulmr solution
这是另一个解决方案:
This is another solution:
假设我们有两个列表,
从上面两个列表中可以看到,列表 2 中存在项目 1、3、5,而项目 7、9 则不存在。另一方面,列表 1 中存在项目 1、3、5,而项目 2、4 则不存在。
返回包含项目 7、9 和 2、4 的新列表的最佳解决方案是什么?
上面的所有答案都找到了解决方案,那么现在什么是最佳的呢?
对比
使用 timeit 我们可以看到结果
返回
Let's say we have two lists
we can see from the above two lists that items 1, 3, 5 exist in list2 and items 7, 9 do not. On the other hand, items 1, 3, 5 exist in list1 and items 2, 4 do not.
What is the best solution to return a new list containing items 7, 9 and 2, 4?
All answers above find the solution, now whats the most optimal?
versus
Using timeit we can see the results
returns
如果您应该从列表a中删除所有值,这些值出现在列表b中。
list_diff([1,2,2], [1])
结果:[2,2]
或
If you should remove all values from list a, which are present in list b.
list_diff([1,2,2], [1])
Result: [2,2]
or
如果 difflist 的元素已排序并设置,则可以使用简单的方法。
或使用本机设置方法:
朴素解决方案:0.0787101593292
本机设置解决方案:0.998837615564
You could use a naive method if the elements of the difflist are sorted and sets.
or with native set methods:
Naive solution: 0.0787101593292
Native set solution: 0.998837615564
如果您遇到
TypeError: unhashable type: 'list'
您需要将列表或集合转换为元组,例如另请参阅如何在Python中比较列表/集合的列表?
If you run into
TypeError: unhashable type: 'list'
you need to turn lists or sets into tuples, e.g.See also How to compare a list of lists/sets in python?
我对此有点太晚了,但是您可以将上述一些代码的性能与此进行比较,其中两个最快的竞争者是,
我对初级编码水平表示歉意。
I am little too late in the game for this but you can do a comparison of performance of some of the above mentioned code with this, two of the fastest contenders are,
I apologize for the elementary level of coding.
这里有一些简单的、保序的方法来区分两个字符串列表。
代码
使用
pathlib 的不寻常方法
:假设两个列表都包含具有相同开头的字符串。有关更多详细信息,请参阅文档。请注意,与集合操作相比,它并不是特别快。
使用
itertools.zip_longest
的直接实现:Here are a few simple, order-preserving ways of diffing two lists of strings.
Code
An unusual approach using
pathlib
:This assumes both lists contain strings with equivalent beginnings. See the docs for more details. Note, it is not particularly fast compared to set operations.
A straight-forward implementation using
itertools.zip_longest
:我更喜欢使用转换为集合,然后使用“difference()”函数。完整的代码是:
输出:
这是最容易理解的,而且将来如果您处理大数据,如果不需要重复项,将其转换为集将删除重复项。希望有帮助;-)
I prefer to use converting to sets and then using the "difference()" function. The full code is :
Output:
It's the easiest to undersand, and morover in future if you work with large data, converting it to sets will remove duplicates if duplicates are not required. Hope it helps ;-)
我知道这个问题已经得到了很好的答案,但我希望使用 numpy 添加以下方法。
I know this question got great answers already but I wish to add the following method using
numpy
.如果你想要更像变更集的东西......可以使用 Counter
if you want something more like a changeset... could use Counter
我们可以计算列表的交集减去并集:
We can calculate intersection minus union of lists:
这个问题可以用一根线解决。
问题是给定两个列表(temp1 和 temp2),在第三个列表(temp3)中返回它们的差异。
This can be solved with one line.
The question is given two lists (temp1 and temp2) return their difference in a third list (temp3).
这是区分两个列表(无论内容是什么)的简单方法,您可以得到如下所示的结果:
希望这会有所帮助。
Here is an simple way to distinguish two lists (whatever the contents are), you can get the result as shown below :
Hope this will helpful.
您可以循环浏览第一个列表,对于不在第二个列表中但在第一个列表中的每个项目,将其添加到第三个列表中。例如:
You can cycle through the first list and, for every item that isn't in the second list but is in the first list, add it to the third list. E.g:
如果列表是对象而不是基本类型,这是一种方法。
代码比较明确,并给出了一份副本。
这可能不是一个有效的实现,但对于较小的对象列表来说是干净的。
If the lists are of objects and not primitive types, this is one way of doing it.
The code is more explicit and gives out a copy.
This may not be an efficient implementation, but clean for smaller lists of objects.