使用 Python 删除对象列表中的重复项

发布于 2024-10-01 16:24:21 字数 300 浏览 7 评论 0原文

我有一个对象列表,还有一个充满记录的数据库表。我的对象列表有一个标题属性,我想从列表中删除任何具有重复标题的对象(保留原始对象)。

然后我想检查我的对象列表是否有数据库中任何记录的重复项,如果有,请从列表中删除这些项目,然后再将它们添加到数据库中。

我已经看到了从列表中删除重复项的解决方案,如下所示:myList = list(set(myList)),但我不确定如何使用对象列表来做到这一点?

我也需要维护对象列表的顺序。我也在想也许我可以使用 difflib 来检查标题中的差异。

I've got a list of objects and I've got a db table full of records. My list of objects has a title attribute and I want to remove any objects with duplicate titles from the list (leaving the original).

Then I want to check if my list of objects has any duplicates of any records in the database and if so, remove those items from list before adding them to the database.

I have seen solutions for removing duplicates from a list like this: myList = list(set(myList)), but i'm not sure how to do that with a list of objects?

I need to maintain the order of my list of objects too. I was also thinking maybe I could use difflib to check for differences in the titles.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

寒冷纷飞旳雪 2024-10-08 16:24:21

仅当您知道重复项是什么时,set(list_of_objects) 才会删除重复项,也就是说,您需要定义对象的唯一性。

为此,您需要使对象可散列。您需要定义 __hash____eq__ 方法,具体方法如下:

http://docs.python.org/glossary.html#term-hashable

不过,您可能只需要定义 __eq__ 方法。

编辑:如何实现__eq__方法:

正如我提到的,您需要知道对象的唯一性定义。假设我们有一本具有author_name和title属性的书,它们的组合是唯一的(因此,我们可以有很多斯蒂芬·金创作的书,也可以有很多名为《闪灵》的书,但只有一本名为《闪灵》的书是斯蒂芬·金写的),那么执行如下:

def __eq__(self, other):
    return self.author_name==other.author_name\
           and self.title==other.title

同样,这就是我有时实现 __hash__ 方法的方式:

def __hash__(self):
    return hash(('title', self.title,
                 'author_name', self.author_name))

您可以检查一下,如果您创建具有相同作者和标题的 2 本书的列表,则图书对象将相同(使用is运算符)和等于(使用==运算符)。另外,当使用 set() 时,它将删除一本书。

编辑:这是我的一个旧答案,但我现在才注意到它有一个错误,该错误已在最后一段中用删除线纠正:具有相同 hash()is 相比,不会给出 True。但是,如果您打算将它们用作集合的元素或字典中的键,则使用对象的哈希性。

The set(list_of_objects) will only remove the duplicates if you know what a duplicate is, that is, you'll need to define a uniqueness of an object.

In order to do that, you'll need to make the object hashable. You need to define both __hash__ and __eq__ method, here is how:

http://docs.python.org/glossary.html#term-hashable

Though, you'll probably only need to define __eq__ method.

EDIT: How to implement the __eq__ method:

You'll need to know, as I mentioned, the uniqueness definition of your object. Supposed we have a Book with attributes author_name and title that their combination is unique, (so, we can have many books Stephen King authored, and many books named The Shining, but only one book named The Shining by Stephen King), then the implementation is as follows:

def __eq__(self, other):
    return self.author_name==other.author_name\
           and self.title==other.title

Similarly, this is how I sometimes implement the __hash__ method:

def __hash__(self):
    return hash(('title', self.title,
                 'author_name', self.author_name))

You can check that if you create a list of 2 books with same author and title, the book objects will be the same (with is operator) and equal (with == operator). Also, when set() is used, it will remove one book.

EDIT: This is one old anwser of mine, but I only now notice that it has the error which is corrected with strikethrough in the last paragraph: objects with the same hash() won't give True when compared with is. Hashability of object is used, however, if you intend to use them as elements of set, or as keys in dictionary.

暖树树初阳… 2024-10-08 16:24:21

由于它们不可散列,因此您不能直接使用集合。标题应该是。

这是第一部分。

seen_titles = set()
new_list = []
for obj in myList:
    if obj.title not in seen_titles:
        new_list.append(obj)
        seen_titles.add(obj.title)

不过,您将需要描述第二部分使用的数据库/ORM 等。

Since they're not hashable, you can't use a set directly. The titles should be though.

Here's the first part.

seen_titles = set()
new_list = []
for obj in myList:
    if obj.title not in seen_titles:
        new_list.append(obj)
        seen_titles.add(obj.title)

You're going to need to describe what database/ORM etc. you're using for the second part though.

西瓜 2024-10-08 16:24:21

如果您不能(或不会)为对象定义 __eq__,您可以使用字典理解来实现相同的目的:

unique = list({item.attribute: item for item in mylist}.values())

请注意,这将包含 last 给定键的实例,例如
对于 mylist = [Item(attribute=1, tag='first'), Item(attribute=1, tag='second'), Item(attribute=2, tag='third')]您将得到[Item(attribute=1, tag='second'), Item(attribute=2, tag='third')]。您可以使用 mylist[::-1] (如果存在完整列表)来解决此问题。

If you can't (or won't) define __eq__ for the objects, you can use a dict-comprehension to achieve the same end:

unique = list({item.attribute: item for item in mylist}.values())

Note that this will contain the last instance of a given key, e.g.
for mylist = [Item(attribute=1, tag='first'), Item(attribute=1, tag='second'), Item(attribute=2, tag='third')] you get [Item(attribute=1, tag='second'), Item(attribute=2, tag='third')]. You can get around this by using mylist[::-1] (if the full list is present).

八巷 2024-10-08 16:24:21

这看起来相当小:

new_dict = dict()
for obj in myList:
    if obj.title not in new_dict:
        new_dict[obj.title] = obj

This seems pretty minimal:

new_dict = dict()
for obj in myList:
    if obj.title not in new_dict:
        new_dict[obj.title] = obj
月下凄凉 2024-10-08 16:24:21

对于不可散列的类型,您可以使用 字典理解 根据字段删除重复对象在所有物体中。这对于 Pydantic 特别有用,因为它默认情况下不支持可哈希类型

{ row.title : row for row in rows }.values()

请注意这将仅根据 row.title 考虑重复项,并将采用 row.title 最后匹配的对象。这意味着如果您的行可能具有相同的标题但其他属性的值不同,那么这将不起作用。

例如 [{"title": "test", "myval": 1}, {"title": "test", "myval": 2}] ==> [{"title": "test", "myval": 2}]

如果您想匹配 row 中的多个字段,您可以进一步扩展:

{ f"{row.title}\0{row.value}" : row for row in rows }.values()

空字符 \0 用作字段之间的分隔符。这假设 row.titlerow.value 中均未使用空字符。

For non-hashable types you can use a dictionary comprehension to remove duplicate objects based on a field in all objects. This is particularly useful for Pydantic which doesn't support hashable types by default:

{ row.title : row for row in rows }.values()

Note that this will consider duplicates solely based on on row.title, and will take the last matched object for row.title. This means if your rows may have the same title but different values in other attributes, then this won't work.

e.g. [{"title": "test", "myval": 1}, {"title": "test", "myval": 2}] ==> [{"title": "test", "myval": 2}]

If you wanted to match against multiple fields in row, you could extend this further:

{ f"{row.title}\0{row.value}" : row for row in rows }.values()

The null character \0 is used as a separator between fields. This assumes that the null character isn't used in either row.title or row.value.

我们只是彼此的过ke 2024-10-08 16:24:21

为此,需要 __hash____eq__

需要 __hash__ 将对象添加到集合中,因为 python 的集合是作为哈希表实现的。默认情况下,数字、字符串和元组等不可变对象是可哈希的。

然而,由于鸽巢原理,哈希冲突(两个不同的对象哈希为相同的值)是不可避免的。因此,仅使用哈希值无法区分两个对象,用户必须指定自己的 __eq__ 函数。因此,用户提供的实际哈希函数并不重要,但最好尽量避免哈希冲突以提高性能(请参阅什么是正确的以及实现 __hash__() 的好方法?)。

Both __hash__ and __eq__ are needed for this.

__hash__ is needed to add an object to a set, since python's sets are implemented as hashtables. By default, immutable objects like numbers, strings, and tuples are hashable.

However, hash collisions (two distinct objects hashing to the same value) are inevitable, due to the pigeonhole principle. So, two objects cannot be distinguished only using their hash, and the user must specify their own __eq__ function. Thus, the actual hash function the user provides is not crucial, though it is best to try to avoid hash collisions for performance (see What's a correct and good way to implement __hash__()?).

晚风撩人 2024-10-08 16:24:21

我最近最终使用了下面的代码。它与其他答案类似,因为它迭代列表并记录它所看到的内容,然后删除它已经看到的任何项目,但它不会创建重复的列表,而是只是从原始列表中删除该项目。

seen = {}
for obj in objList:
    if obj["key-property"] in seen.keys():
        objList.remove(obj)
    else:
        seen[obj["key-property"]] = 1

I recently ended up using the code below. It is similar to other answers as it iterates over the list and records what it is seeing and then removes any item that it has already seen but it doesn't create a duplicate list, instead it just deletes the item from original list.

seen = {}
for obj in objList:
    if obj["key-property"] in seen.keys():
        objList.remove(obj)
    else:
        seen[obj["key-property"]] = 1
写给空气的情书 2024-10-08 16:24:21

这很简单,朋友们:-

<块引用>
<块引用>

a = [5,6,7,32,32,32,32,32,32,32,32]

a = 列表(集合(a))

打印(一)


[5,6,7,32]

就是这样! :)

Its quite easy freinds :-

a = [5,6,7,32,32,32,32,32,32,32,32]

a = list(set(a))

print (a)

[5,6,7,32]

thats it ! :)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文