根据次要字段逐一删除重复项
我有一个用例,我正在使用出现的对象:
{
"data": {
"uuid": 0001-1234-5678-9101
},
"organizationId": 10192432,
"lastCheckin": 2022-03-19T08:23:02.435+00:00
}
由于应用程序中的某些旧错误,我们在数据库中积累了许多重复的这些重复项。重复项的起源已在即将发布的版本中解决,但是我需要确保在发布之前没有这样的重复项,因为该版本对“ data.uuid”属性包含了唯一的约束。
我试图根据以下标准删除记录:
Any duplicate record based on "data.uuid" WHERE lastCheckin is NOT the most recent OR organizationId is missing.
不幸的是,我对使用MongoDB的新手是新手,并且不知道如何在查询中表达这一点。我已经尝试汇总以获取重复记录,虽然我能够做到这一点,但到目前聚合的一部分。这是我想到的:
db.collection.aggregate([
{ $match: {
"_id": { "$ne": null },
"count": { "$gt": 1 }
}},
{ $group: {
_id: "$data.uuid",
"count": {
"$sum": 1
}
}},
{ $project: {
"uuid": "$_id",
"_id": 0
}}
])
以上是根据描述重复项聚集的其他各种堆叠式帖子将其串在一起的。我不确定这是否是解决此问题的正确方法。我可以确定的一个直接问题是,仅获取“ data.uuid”属性而没有任何其他条件,就可以允许我识别无效的重复项,因此很难设想一个可以删除无效记录而无需记录有效记录的单个查询。 感谢您的帮助。
I have a use case where I am working with objects that appear as such:
{
"data": {
"uuid": 0001-1234-5678-9101
},
"organizationId": 10192432,
"lastCheckin": 2022-03-19T08:23:02.435+00:00
}
Due to some old bugs in our application, we've accumulated many duplicates for these items in the database. The origin of the duplicates has been resolved in an upcoming release, but I need to ensure that prior to the release there are no such duplicates because the release includes a unique constraint on the "data.uuid" property.
I am trying to delete records based on the following criteria:
Any duplicate record based on "data.uuid" WHERE lastCheckin is NOT the most recent OR organizationId is missing.
Unfortunately, I am rather new to using MongoDB and do not know how to express this in a query. I have tried aggregated to obtain the duplicate records and, while I've been able to do so, I have so far been unable to exclude the records in each duplicate group containing the most recent "lastCheckin" value or even include "organizationId" as a part of the aggregation. Here's what I came up with:
db.collection.aggregate([
{ $match: {
"_id": { "$ne": null },
"count": { "$gt": 1 }
}},
{ $group: {
_id: "$data.uuid",
"count": {
"$sum": 1
}
}},
{ $project: {
"uuid": "$_id",
"_id": 0
}}
])
The above was mangled together based on various other stackoverflow posts describing the aggregation of duplicates. I am not sure whether this is the right way to approach this problem. One immediate problem that I can identify is that simply getting the "data.uuid" property without any additional criteria allowing me to identify the invalid duplicates makes it hard to envision a single query that can delete the invalid records without taking the valid records.
Thanks for any help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我不确定是否可以通过单个查询进行操作,但这就是我要处理的方式,首先通过
lastcheckin
对文档进行排序,然后通过data.uuid ,像这样:
Playground链接。
获得这些结果后,您可以根据自己的条件过滤这些文档,您想删除并收集其
_id
。每个组的文档将按lastcheckin
按降序排序,因此过滤应该很容易。最后,使用此查询删除文档:
I am not sure if this is possible via a single query, but this is how I would approach it, first sort the documents by
lastCheckIn
and then group the documents bydata.uuid
, like this:Playground link.
Once you have these results, you can filter out the documents, according to your criteria, which you want to delete and collect their
_id
. The documents per group will be sorted bylastCheckIn
in descending order, so filtering should be easy.Finally, delete the documents, using this query: