递归SQL以消除重复项?
我有一个SQL,我需要在ID之间进行作业,但前提是确切的“匹配合作伙伴”。
原始表:
id_orig | id_new |
---|---|
1 | 2 |
3 | 5 |
4 | 10 |
98 | 200 |
100 | 100 |
100 | 300 |
首先我需要找到一种方法,以更新所有对,其中ID_new引用完全一个ID_ORIG:
select ID_orig,id_orig,id_orig,id_orig,id_orig,id_new,id_new,从mytable comparetify计数(*)(iD_new分区)= 1
这将导致以下结果集(1):
id_orig | id_new |
---|---|
1 | 2 |
3 | 5 |
4 | 10 |
100 | 300 |
问题是,当我现在查看“其余的”集合“(在这种情况下,这意味着每个ID,这不是第一个结果集的一部分),有新行,现在完全有一个“匹配合作伙伴”。
WITH cte as (
select id_orig, id_new
from mytable
QUALIFY COUNT(*) OVER (PARTITION BY id_new) = 1
)
select id_orig, id_new
from mytable
where id_orig not in (select id_orig from cte)
and id_new not in (select id_new from cte)
QUALIFY COUNT(*) OVER (PARTITION BY id_new) = 1
结果集2:
ID_ORIG | ID_NEW |
---|---|
98 | 200 |
实际上,找到所有“匹配”合作伙伴还需要更多步骤,所以我的问题是:是否可以使用一个查询从结果集1和2中选择行?也许递归?
简要摘要我想实现的目标:
- 在集合中找到所有行,id_new是唯一的
- (这些ID在源表中标记为“准备就绪”,下次将被忽略。)
- 与步骤1相同:找到所有集合中的行,id_new是唯一的(但是从第一个结果集忽略ID ..等等)。
由于我不知道我需要多少次来运行这些步骤,直到找到每一行,我想知道是否可以选择一个查询的每一行。
I have a SQL, where I need to make assignments between ids, but only if there is exactling one "matching partner".
orig table:
id_orig | id_new |
---|---|
1 | 2 |
3 | 5 |
4 | 10 |
98 | 200 |
100 | 200 |
100 | 300 |
First I need to find a way, to update all the pairs, where an id_new references exactly one id_orig:
select id_orig, id_new from mytable QUALIFY COUNT(*) OVER (PARTITION BY id_new) = 1
this would result in the following resultset (1):
id_orig | id_new |
---|---|
1 | 2 |
3 | 5 |
4 | 10 |
100 | 300 |
The problem is, that when I now look at the "rest of the set" (which in this case means every id, that wasn't part of the first resultset), there are new rows, which now exactly have one "matching partner".
WITH cte as (
select id_orig, id_new
from mytable
QUALIFY COUNT(*) OVER (PARTITION BY id_new) = 1
)
select id_orig, id_new
from mytable
where id_orig not in (select id_orig from cte)
and id_new not in (select id_new from cte)
QUALIFY COUNT(*) OVER (PARTITION BY id_new) = 1
resultset 2:
id_orig | id_new |
---|---|
98 | 200 |
In reality, there could be many more steps needed to find all the "matching" partners, so my question is: is it possible to select the rows from resultset 1 and 2 with one query? Maybe recursively?
A brief summary of what I want to achieve:
- Find all rows in the set, where id_new is unique
- ( These ids are marked as "ready" in the source table, and will be ignored next time.)
- Same as step 1: find all rows in the set, where id_new is unique (but ignore id's from the first resultset.. and so on).
Since I don't know how many times I would need to run these steps, until every row is found, I was wondering, if it was possible to select every row with one query.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我会尝试采用以下方法:
这应该返回第一个CTE与集合其余部分之间的非匹配字段中的
mytable
。I'd try with the following approach:
This should return the non-matching fields between the first CTE and the rest of the set in
mytable
.