合并记录

发布于 2024-11-04 20:21:20 字数 402 浏览 0 评论 0原文

我想合并一组记录

(id) /     (referencedid)

1         10  
1         11  
2         11  
2         10  
3         10  
3         11  
3         12

查询的结果应该是

1         10  
1         11  
3         10  
3         11  
3         12  

所以,因为 id=1 和 id=2 具有相同的一组相应的引用 ID {10,11},它们将被合并。但 id=3 对应的referenceid不一样,因此不会合并。

完成这项工作的好方法是什么?

I want to consolidate a set of records

(id) /     (referencedid)

1         10  
1         11  
2         11  
2         10  
3         10  
3         11  
3         12

The result of query should be

1         10  
1         11  
3         10  
3         11  
3         12  

So, since id=1 and id=2 has same set of corresponding referenceids {10,11} they would be consolidated. But id=3 s corresponding referenceids are not the same, hence wouldnt be consolidated.

What would be good way to get this done?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

多彩岁月 2024-11-11 20:21:20
Select id, referenceid
From MyTable
Where Id In (
                Select Min( Z.Id ) As Id
                From    (
                        Select Z1.id, Group_Concat( Z1.referenceid ) As signature
                        From    (
                                Select id, referenceid
                                From MyTable
                                Order By id, referenceid
                                ) As Z1
                        Group By Z1.id
                        ) As Z
                Group By Z.Signature
                )
Select id, referenceid
From MyTable
Where Id In (
                Select Min( Z.Id ) As Id
                From    (
                        Select Z1.id, Group_Concat( Z1.referenceid ) As signature
                        From    (
                                Select id, referenceid
                                From MyTable
                                Order By id, referenceid
                                ) As Z1
                        Group By Z1.id
                        ) As Z
                Group By Z.Signature
                )
孤蝉 2024-11-11 20:21:20
-- generate count of elements for each distinct id
with Counts as (
    select
        id,
        count(1) as ReferenceCount
    from
        tblReferences R
    group by
        R.id
)
-- generate every pairing of two different id's, along with
-- their counts, and how many are equivalent between the two
,Pairings as (
    select
        R1.id as id1
        ,R2.id as id2
        ,C1.ReferenceCount as count1
        ,C2.ReferenceCount as count2
        ,sum(case when R1.referenceid = R2.referenceid then 1 else 0 end) as samecount
    from
        tblReferences R1 join Counts C1 on R1.id = C1.id
    cross join
        tblReferences R2 join Counts C2 on R2.id = C2.id
    where
        R1.id < R2.id
    group by
        R1.id, C1.ReferenceCount, R2.id, C2.ReferenceCount
)
-- generate the list of ids that are safe to remove by picking
-- out any id's that have the same number of matches, and same
-- size of list, which means their reference lists are identical.
-- since id2 > id, we can safely remove id2 as a copy of id, and
-- the smallest id of which all id2 > id are copies will be left
,RemovableIds as (
    select
        distinct id2 as id
    from
        Pairings P
    where
        P.count1 = P.count2 and P.count1 = P.samecount
)
-- validate the results by just selecting to see which id's
-- will be removed.  can also include id in the query above
-- to see which id was identified as the copy

select id from RemovableIds R

-- comment out `select` above and uncomment `delete` below to
-- remove the records after verifying they are correct!

--delete from tblReferences where id in (select id from RemovableIds) R
-- generate count of elements for each distinct id
with Counts as (
    select
        id,
        count(1) as ReferenceCount
    from
        tblReferences R
    group by
        R.id
)
-- generate every pairing of two different id's, along with
-- their counts, and how many are equivalent between the two
,Pairings as (
    select
        R1.id as id1
        ,R2.id as id2
        ,C1.ReferenceCount as count1
        ,C2.ReferenceCount as count2
        ,sum(case when R1.referenceid = R2.referenceid then 1 else 0 end) as samecount
    from
        tblReferences R1 join Counts C1 on R1.id = C1.id
    cross join
        tblReferences R2 join Counts C2 on R2.id = C2.id
    where
        R1.id < R2.id
    group by
        R1.id, C1.ReferenceCount, R2.id, C2.ReferenceCount
)
-- generate the list of ids that are safe to remove by picking
-- out any id's that have the same number of matches, and same
-- size of list, which means their reference lists are identical.
-- since id2 > id, we can safely remove id2 as a copy of id, and
-- the smallest id of which all id2 > id are copies will be left
,RemovableIds as (
    select
        distinct id2 as id
    from
        Pairings P
    where
        P.count1 = P.count2 and P.count1 = P.samecount
)
-- validate the results by just selecting to see which id's
-- will be removed.  can also include id in the query above
-- to see which id was identified as the copy

select id from RemovableIds R

-- comment out `select` above and uncomment `delete` below to
-- remove the records after verifying they are correct!

--delete from tblReferences where id in (select id from RemovableIds) R
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文