将 Django 模型迁移到 unique_together 约束

发布于 2024-12-17 16:26:18 字数 429 浏览 4 评论 0原文

我有一个包含三个字段的模型，

class MyModel(models.Model):
    a    = models.ForeignKey(A)
    b    = models.ForeignKey(B)
    c    = models.ForeignKey(C)

我想在这些字段之间强制执行唯一约束，并找到了 django 的 unique_together，这似乎是解决方案。但是，我已经有一个现有的数据库，并且有很多重复项。我知道，由于 unique_together 在数据库级别工作，我需要对行进行唯一化，然后尝试迁移。

有没有一种好方法可以删除重复项（其中重复项具有相同的 (A,B,C)），以便我可以运行迁移来获取 unique_together 约束？

原文

I have a model with three fields

class MyModel(models.Model):
    a    = models.ForeignKey(A)
    b    = models.ForeignKey(B)
    c    = models.ForeignKey(C)

I want to enforce a unique constraint between these fields, and found django's unique_together, which seems to be the solution. However, I already have an existing database, and there are many duplicates. I know that since unique_together works at the database level, I need to unique-ify the rows, and then try a migration.

Is there a good way to go about removing duplicates (where a duplicate has the same (A,B,C)) so that I can run migration to get the unique_together contstraint?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

鱼窥荷 2024-12-24 16:26:18

如果您愿意任意选择其中一个重复项，我认为以下方法可能会成功。也许不是最有效的，但足够简单，我想你只需要运行一次。请自己在一些测试数据上验证这一切是否有效，以防我做了一些愚蠢的事情，因为您将要删除一堆数据。

首先，我们找到形成重复项的对象组。对于每个组，（任意）选择一个我们要保留的“主人”。我们选择的方法是选择 pk 最低的一个，

from django.db.models import Min, Count

master_pks = MyModel.objects.values('A', 'B', 'C'
    ).annotate(Min('pk'), count=Count('pk')
    ).filter(count__gt=1
    ).values_list('pk__min', flat=True)

然后循环遍历每个 master，并删除其所有重复项

masters = MyModel.objects.in_bulk( list(master_pks) )

for master in masters.values():
    MyModel.objects.filter(a=master.a, b=master.b, c=master.c
        ).exclude(pk=master.pk).del_ACCIDENT_PREVENTION_ete()

If you are happy to choose one of the duplicates arbitrarily, I think the following might do the trick. Perhaps not the most efficient but simple enough and I guess you only need to run this once. Please verify this all works yourself on some test data in case I've done something silly, since you are about to delete a bunch of data.

First we find groups of objects which form duplicates. For each group, (arbitrarily) pick a "master" that we are going to keep. Our chosen method is to pick the one with lowest pk

from django.db.models import Min, Count

master_pks = MyModel.objects.values('A', 'B', 'C'
    ).annotate(Min('pk'), count=Count('pk')
    ).filter(count__gt=1
    ).values_list('pk__min', flat=True)

we then loop over each master, and delete all its duplicates

masters = MyModel.objects.in_bulk( list(master_pks) )

for master in masters.values():
    MyModel.objects.filter(a=master.a, b=master.b, c=master.c
        ).exclude(pk=master.pk).del_ACCIDENT_PREVENTION_ete()

回复收藏 0 原文

傲影 2024-12-24 16:26:18

我想添加一个稍微改进的答案，它将删除单个查询中的所有内容，而不是循环和删除每个重复的组。如果您有很多记录，这会快得多。

non_dupe_pks = list(
    Model.objects.values('A', 'B', 'C')
    .annotate(Min('pk'), count=Count('pk'))
    .order_by()
    .values_list('pk__min', flat=True)
)

dupes = Model.objects.exclude(pk__in=non_dupe_pks)
dupes.delete()

在第一个查询中添加 order_by() 非常重要，否则模型中的默认顺序可能会扰乱聚合。

您可以注释掉最后一行并使用 dupes.count() 来检查查询是否按预期工作。

I want to add a slightly improved answer that will delete everything in a single query, instead of looping and deleting for each duplicate group. This will be much faster if you have a lot of records.

non_dupe_pks = list(
    Model.objects.values('A', 'B', 'C')
    .annotate(Min('pk'), count=Count('pk'))
    .order_by()
    .values_list('pk__min', flat=True)
)

dupes = Model.objects.exclude(pk__in=non_dupe_pks)
dupes.delete()

It's important to add order_by() in the first query otherwise the default ordering in the model might mess up with the aggregation.

You can comment out the last line and use dupes.count() to check if the query is working as expected.

回复收藏 0 原文

~没有更多了~