列表列表的唯一性
我很好奇什么是唯一化此类数据对象的有效方法:
testdata = [ ['9034968', 'ETH'], ['14160113', 'ETH'], ['9034968', 'ETH'], ['11111', 'NOT'], ['9555269', 'NOT'], ['15724032', 'ETH'], ['15481740', 'ETH'], ['15481757', 'ETH'], ['15481724', 'ETH'], ['10307528', 'ETH'], ['15481757', 'ETH'], ['15481724', 'ETH'], ['15481740', 'ETH'], ['15379365', 'ETH'], ['11111', 'NOT'], ['9555269', 'NOT'], ['15379365', 'ETH']
]
对于每个数据对,左侧数字字符串加上右侧类型告诉数据元素的唯一性。返回值应该是列表的列表,与testdata
相同,但只应保留唯一值。
I am curious what would be an efficient way of uniquifying such data objects:
testdata = [ ['9034968', 'ETH'], ['14160113', 'ETH'], ['9034968', 'ETH'], ['11111', 'NOT'], ['9555269', 'NOT'], ['15724032', 'ETH'], ['15481740', 'ETH'], ['15481757', 'ETH'], ['15481724', 'ETH'], ['10307528', 'ETH'], ['15481757', 'ETH'], ['15481724', 'ETH'], ['15481740', 'ETH'], ['15379365', 'ETH'], ['11111', 'NOT'], ['9555269', 'NOT'], ['15379365', 'ETH']
]
For each data pair, left numeric string PLUS the type at the right tells the uniqueness of a data element. The returned value should be a list of lists, the same as testdata
, but only unique values should be kept.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
您可以使用一组:
您还可以查看此页面,它对各种方法进行了基准测试,要么维持秩序,要么不维持秩序。
You can use a set:
You can also see this page which benchmarks a variety of methods that either preserve or don't preserve order.
我尝试了@Mark 的答案并收到错误。将列表和每个元素转换为元组使其工作。但不确定这是否是最好的方法。
当然,可以使用列表理解来表达同样的事情。
我正在使用Python 2.6.2。
更新
@Mark 此后更改了他的答案。他当前的答案使用元组并且会起作用。我的也一样:)
更新 2
感谢@Mark。我更改了答案以返回列表列表而不是元组列表。
I tried @Mark's answer and got an error. Converting the list and each elements into a tuple made it work. Not sure if this the best way though.
Of course the same thing can be expressed using a list comprehension instead.
I am using Python 2.6.2.
Update
@Mark has since changed his answer. His current answer uses tuples and will work. So will mine :)
Update 2
Thanks to @Mark. I have changed my answer to return a list of lists rather than a list of tuples.
在
中使用
来解决这个问题:unique
numpy请注意,需要指定
axis
关键字,否则列表首先会被展平。或者,使用
vstack
:Use
unique
innumpy
to solve this:Note that the
axis
keyword needs to be specified otherwise the list is first flattened.Alternatively, use
vstack
:对 @Mark Byers 解决方案进行一些扩展,您也可以只进行一个列表理解和类型转换来获得您需要的内容:
此外,如果您不喜欢列表推导式,因为许多人觉得它们令人困惑,您可以在 for 循环中执行相同的操作:
Expanding a bit on @Mark Byers solution, you can also just do one list comprehension and typecast to get what you need:
Also, if you don't like list comprehensions as many find them confusing, you can do the same in a for loop:
保留顺序的选项(Python 3.7+)
内部列表变成元组:
内部列表保留为列表(制作人员):
如果新列表元素是旧列表元素的函数,
可以使用海象运算符
:=
或者我们可以将内部列表转换为元组,然后再返回列表
Options for preserving order (Python 3.7+)
Inner lists become tuples:
Inner lists stay as lists (credits):
In case new list elements are a function of old ones,
either a walrus operator
:=
can be usedor we can turn inner lists into tuples and then back to lists
如果您有一个对象列表,则可以修改 @Mark Byers 的答案:
其中 testdata 是一个对象列表,其中包含一个列表 testList 作为属性。
if you have a list of objects than you can modify @Mark Byers answer to:
where testdata is a list of objects which has a list testList as attribute.
我正要发表我自己的看法,直到我注意到 @pyfunc 已经提出了类似的东西。无论如何,我都会发布我对这个问题的看法,以防有帮助。
基本上,您使用列表理解将列表中的每个元素连接成单个字符串,这样您就拥有了单个字符串的列表。这样就更容易变成一套,这使得它独一无二。然后,您只需将其拆分到另一端并将其转换回原始列表即可。
我不知道这在性能方面如何比较,但我认为这是一个简单且易于理解的解决方案。
I was about to post my own take on this until I noticed that @pyfunc had already come up with something similar. I'll post my take on this problem anyway in case it's helpful.
Basically, you concatenate each element of your list into a single string using a list comprehension, so that you have a list of single strings. This is then much easier to turn into a set, which makes it unique. Then you simply split it on the other end and convert it back to your original list.
I don't know how this compares in terms of performance but it's a simple and easy-to-understand solution I think.