列表列表的唯一性

发布于 2024-09-24 00:38:59 字数 501 浏览 11 评论 0原文

我很好奇什么是唯一化此类数据对象的有效方法：

testdata = [ ['9034968', 'ETH'], ['14160113', 'ETH'], ['9034968', 'ETH'], ['11111', 'NOT'], ['9555269', 'NOT'], ['15724032', 'ETH'], ['15481740', 'ETH'], ['15481757', 'ETH'], ['15481724', 'ETH'], ['10307528', 'ETH'], ['15481757', 'ETH'], ['15481724', 'ETH'], ['15481740', 'ETH'], ['15379365', 'ETH'], ['11111', 'NOT'], ['9555269', 'NOT'], ['15379365', 'ETH']
]

对于每个数据对，左侧数字字符串加上右侧类型告诉数据元素的唯一性。返回值应该是列表的列表，与testdata相同，但只应保留唯一值。

原文

I am curious what would be an efficient way of uniquifying such data objects:

testdata = [ ['9034968', 'ETH'], ['14160113', 'ETH'], ['9034968', 'ETH'], ['11111', 'NOT'], ['9555269', 'NOT'], ['15724032', 'ETH'], ['15481740', 'ETH'], ['15481757', 'ETH'], ['15481724', 'ETH'], ['10307528', 'ETH'], ['15481757', 'ETH'], ['15481724', 'ETH'], ['15481740', 'ETH'], ['15379365', 'ETH'], ['11111', 'NOT'], ['9555269', 'NOT'], ['15379365', 'ETH']
]

For each data pair, left numeric string PLUS the type at the right tells the uniqueness of a data element. The returned value should be a list of lists, the same as testdata, but only unique values should be kept.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

挽心 2024-10-01 00:38:59

您可以使用一组：

unique_data = [list(x) for x in set(tuple(x) for x in testdata)]

您还可以查看此页面，它对各种方法进行了基准测试，要么维持秩序，要么不维持秩序。

You can use a set:

unique_data = [list(x) for x in set(tuple(x) for x in testdata)]

You can also see this page which benchmarks a variety of methods that either preserve or don't preserve order.

回复收藏 0 原文

放飞的风筝 2024-10-01 00:38:59

我尝试了@Mark 的答案并收到错误。将列表和每个元素转换为元组使其工作。但不确定这是否是最好的方法。

list(map(list, set(map(lambda i: tuple(i), testdata))))

当然，可以使用列表理解来表达同样的事情。

[list(i) for i in set(tuple(i) for i in testdata)]

我正在使用Python 2.6.2。

更新

@Mark 此后更改了他的答案。他当前的答案使用元组并且会起作用。我的也一样:)

更新 2

感谢@Mark。我更改了答案以返回列表列表而不是元组列表。

I tried @Mark's answer and got an error. Converting the list and each elements into a tuple made it work. Not sure if this the best way though.

list(map(list, set(map(lambda i: tuple(i), testdata))))

Of course the same thing can be expressed using a list comprehension instead.

[list(i) for i in set(tuple(i) for i in testdata)]

I am using Python 2.6.2.

Update

@Mark has since changed his answer. His current answer uses tuples and will work. So will mine :)

Update 2

Thanks to @Mark. I have changed my answer to return a list of lists rather than a list of tuples.

回复收藏 0 原文

傲影 2024-10-01 00:38:59

在 中使用 unique numpy 来解决这个问题：

import numpy as np

np.unique(np.array(testdata), axis=0)

请注意，需要指定 axis 关键字，否则列表首先会被展平。

或者，使用 vstack：

np.vstack({tuple(row) for row in testdata})

Use unique in numpy to solve this:

import numpy as np

np.unique(np.array(testdata), axis=0)

Note that the axis keyword needs to be specified otherwise the list is first flattened.

Alternatively, use vstack:

np.vstack({tuple(row) for row in testdata})

回复收藏 0 原文

上课铃就是安魂曲 2024-10-01 00:38:59

对 @Mark Byers 解决方案进行一些扩展，您也可以只进行一个列表理解和类型转换来获得您需要的内容：

testdata = list(set(tuple(x) for x in testdata))

此外，如果您不喜欢列表推导式，因为许多人觉得它们令人困惑，您可以在 for 循环中执行相同的操作：

for i, e in enumerate(testdata):
    testdata[i] = tuple(e)
testdata = list(set(testdata))

Expanding a bit on @Mark Byers solution, you can also just do one list comprehension and typecast to get what you need:

testdata = list(set(tuple(x) for x in testdata))

Also, if you don't like list comprehensions as many find them confusing, you can do the same in a for loop:

for i, e in enumerate(testdata):
    testdata[i] = tuple(e)
testdata = list(set(testdata))

回复收藏 0 原文

琴流音 2024-10-01 00:38:59

保留顺序的选项（Python 3.7+）

内部列表变成元组：

list(dict.fromkeys(map(tuple, testdata)))

list({tuple(x): 1 for x in testdata})

内部列表保留为列表（制作人员）：

list({tuple(x): x for x in testdata}.values())

如果新列表元素是旧列表元素的函数，
可以使用海象运算符 :=

list({tuple(y:=f(x)): y for x in testdata}.values())

或者我们可以将内部列表转换为元组，然后再返回列表

list(map(list, {tuple(x): 1 for x in testdata}))

list(map(list, dict.fromkeys(map(tuple, testdata))))

Options for preserving order (Python 3.7+)

Inner lists become tuples:

list(dict.fromkeys(map(tuple, testdata)))

list({tuple(x): 1 for x in testdata})

Inner lists stay as lists (credits):

list({tuple(x): x for x in testdata}.values())

In case new list elements are a function of old ones,
either a walrus operator := can be used

list({tuple(y:=f(x)): y for x in testdata}.values())

or we can turn inner lists into tuples and then back to lists

list(map(list, {tuple(x): 1 for x in testdata}))

list(map(list, dict.fromkeys(map(tuple, testdata))))

回复收藏 0 原文

故笙诉离歌 2024-10-01 00:38:59

import sets
testdata =[ ['9034968', 'ETH'], ['14160113', 'ETH'], ['9034968', 'ETH'], ['11111', 'NOT'], ['9555269', 'NOT'], ['15724032', 'ETH'], ['15481740', 'ETH'], ['15481757', 'ETH'], ['15481724', 'ETH'], ['10307528', 'ETH'], ['15481757', 'ETH'], ['15481724', 'ETH'], ['15481740', 'ETH'], ['15379365', 'ETH'], ['11111', 'NOT'], ['9555269', 'NOT'], ['15379365', 'ETH']]
conacatData = [x[0] + x[1] for x in testdata]
print conacatData
uniqueSet = sets.Set(conacatData)
uniqueList = [ [t[0:-3], t[-3:]] for t in uniqueSet]
print uniqueList

import sets
testdata =[ ['9034968', 'ETH'], ['14160113', 'ETH'], ['9034968', 'ETH'], ['11111', 'NOT'], ['9555269', 'NOT'], ['15724032', 'ETH'], ['15481740', 'ETH'], ['15481757', 'ETH'], ['15481724', 'ETH'], ['10307528', 'ETH'], ['15481757', 'ETH'], ['15481724', 'ETH'], ['15481740', 'ETH'], ['15379365', 'ETH'], ['11111', 'NOT'], ['9555269', 'NOT'], ['15379365', 'ETH']]
conacatData = [x[0] + x[1] for x in testdata]
print conacatData
uniqueSet = sets.Set(conacatData)
uniqueList = [ [t[0:-3], t[-3:]] for t in uniqueSet]
print uniqueList

回复收藏 0 原文

画离情绘悲伤 2024-10-01 00:38:59

如果您有一个对象列表，则可以修改 @Mark Byers 的答案：

unique_data = [list(x) for x in set(tuple(x.testList) for x in testdata)]

其中 testdata 是一个对象列表，其中包含一个列表 testList 作为属性。

if you have a list of objects than you can modify @Mark Byers answer to:

unique_data = [list(x) for x in set(tuple(x.testList) for x in testdata)]

where testdata is a list of objects which has a list testList as attribute.

回复收藏 0 原文

不必你懂 2024-10-01 00:38:59

我正要发表我自己的看法，直到我注意到 @pyfunc 已经提出了类似的东西。无论如何，我都会发布我对这个问题的看法，以防有帮助。

testdata =[ ['9034968', 'ETH'], ['14160113', 'ETH'], ['9034968', 'ETH'], ['11111', 'NOT'], ['9555269', 'NOT'], ['15724032', 'ETH'], ['15481740', 'ETH'], ['15481757', 'ETH'], ['15481724', 'ETH'], ['10307528', 'ETH'], ['15481757', 'ETH'], ['15481724', 'ETH'], ['15481740', 'ETH'], ['15379365', 'ETH'], ['11111', 'NOT'], ['9555269', 'NOT'], ['15379365', 'ETH']
]
flatdata = [p[0] + "%" + p[1] for p in testdata]
flatdata = list(set(flatdata))
testdata = [p.split("%") for p in flatdata]
print(testdata)

基本上，您使用列表理解将列表中的每个元素连接成单个字符串，这样您就拥有了单个字符串的列表。这样就更容易变成一套，这使得它独一无二。然后，您只需将其拆分到另一端并将其转换回原始列表即可。

我不知道这在性能方面如何比较，但我认为这是一个简单且易于理解的解决方案。

I was about to post my own take on this until I noticed that @pyfunc had already come up with something similar. I'll post my take on this problem anyway in case it's helpful.

testdata =[ ['9034968', 'ETH'], ['14160113', 'ETH'], ['9034968', 'ETH'], ['11111', 'NOT'], ['9555269', 'NOT'], ['15724032', 'ETH'], ['15481740', 'ETH'], ['15481757', 'ETH'], ['15481724', 'ETH'], ['10307528', 'ETH'], ['15481757', 'ETH'], ['15481724', 'ETH'], ['15481740', 'ETH'], ['15379365', 'ETH'], ['11111', 'NOT'], ['9555269', 'NOT'], ['15379365', 'ETH']
]
flatdata = [p[0] + "%" + p[1] for p in testdata]
flatdata = list(set(flatdata))
testdata = [p.split("%") for p in flatdata]
print(testdata)

Basically, you concatenate each element of your list into a single string using a list comprehension, so that you have a list of single strings. This is then much easier to turn into a set, which makes it unique. Then you simply split it on the other end and convert it back to your original list.

I don't know how this compares in terms of performance but it's a simple and easy-to-understand solution I think.

回复收藏 0 原文

~没有更多了~