根据次要字段逐一删除重复项

发布于 2025-02-13 10:31:00 字数 1037 浏览 1 评论 0原文

我有一个用例,我正在使用出现的对象:

{
  "data": {
    "uuid": 0001-1234-5678-9101
  },
  "organizationId": 10192432,
  "lastCheckin": 2022-03-19T08:23:02.435+00:00
}

由于应用程序中的某些旧错误,我们在数据库中积累了许多重复的这些重复项。重复项的起源已在即将发布的版本中解决,但是我需要确保在发布之前没有这样的重复项,因为该版本对“ data.uuid”属性包含了唯一的约束。

我试图根据以下标准删除记录:

Any duplicate record based on "data.uuid" WHERE lastCheckin is NOT the most recent OR organizationId is missing.

不幸的是,我对使用MongoDB的新手是新手,并且不知道如何在查询中表达这一点。我已经尝试汇总以获取重复记录,虽然我能够做到这一点,但到目前聚合的一部分。这是我想到的:

db.collection.aggregate([
  { $match: {
    "_id": { "$ne": null },
    "count": { "$gt": 1 }
  }},
  { $group: {
    _id: "$data.uuid",
    "count": {
      "$sum": 1
    }
  }},
  { $project: {
    "uuid": "$_id", 
    "_id": 0
  }}
])

以上是根据描述重复项聚集的其他各种堆叠式帖子将其串在一起的。我不确定这是否是解决此问题的正确方法。我可以确定的一个直接问题是,仅获取“ data.uuid”属性而没有任何其他条件,就可以允许我识别无效的重复项,因此很难设想一个可以删除无效记录而无需记录有效记录的单个查询。 感谢您的帮助。

I have a use case where I am working with objects that appear as such:

{
  "data": {
    "uuid": 0001-1234-5678-9101
  },
  "organizationId": 10192432,
  "lastCheckin": 2022-03-19T08:23:02.435+00:00
}

Due to some old bugs in our application, we've accumulated many duplicates for these items in the database. The origin of the duplicates has been resolved in an upcoming release, but I need to ensure that prior to the release there are no such duplicates because the release includes a unique constraint on the "data.uuid" property.

I am trying to delete records based on the following criteria:

Any duplicate record based on "data.uuid" WHERE lastCheckin is NOT the most recent OR organizationId is missing.

Unfortunately, I am rather new to using MongoDB and do not know how to express this in a query. I have tried aggregated to obtain the duplicate records and, while I've been able to do so, I have so far been unable to exclude the records in each duplicate group containing the most recent "lastCheckin" value or even include "organizationId" as a part of the aggregation. Here's what I came up with:

db.collection.aggregate([
  { $match: {
    "_id": { "$ne": null },
    "count": { "$gt": 1 }
  }},
  { $group: {
    _id: "$data.uuid",
    "count": {
      "$sum": 1
    }
  }},
  { $project: {
    "uuid": "$_id", 
    "_id": 0
  }}
])

The above was mangled together based on various other stackoverflow posts describing the aggregation of duplicates. I am not sure whether this is the right way to approach this problem. One immediate problem that I can identify is that simply getting the "data.uuid" property without any additional criteria allowing me to identify the invalid duplicates makes it hard to envision a single query that can delete the invalid records without taking the valid records.
Thanks for any help.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

岁月蹉跎了容颜 2025-02-20 10:31:00

我不确定是否可以通过单个查询进行操作,但这就是我要处理的方式,首先通过lastcheckin对文档进行排序,然后通过data.uuid ,像这样:

db.collection.aggregate([
  {
    $sort: {
      lastCheckIn: -1
    }
  },
  {
    $group: {
      _id: "$data.uuid",
      "docs": {
        "$push": "$ROOT"
      }
    }
  },
]);

Playground链接。

获得这些结果后,您可以根据自己的条件过滤这些文档,您想删除并收集其_id。每个组的文档将按lastcheckin按降序排序,因此过滤应该很容易。

最后,使用此查询删除文档:

db.collection.remove({_id: { $in: [\\ array of _ids collected above] }});

I am not sure if this is possible via a single query, but this is how I would approach it, first sort the documents by lastCheckIn and then group the documents by data.uuid, like this:

db.collection.aggregate([
  {
    $sort: {
      lastCheckIn: -1
    }
  },
  {
    $group: {
      _id: "$data.uuid",
      "docs": {
        "$push": "$ROOT"
      }
    }
  },
]);

Playground link.

Once you have these results, you can filter out the documents, according to your criteria, which you want to delete and collect their _id. The documents per group will be sorted by lastCheckIn in descending order, so filtering should be easy.

Finally, delete the documents, using this query:

db.collection.remove({_id: { $in: [\\ array of _ids collected above] }});
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文