递归SQL以消除重复项？

发布于 2025-01-26 05:43:28 字数 1668 浏览 2 评论 0原文

我有一个SQL，我需要在ID之间进行作业，但前提是确切的“匹配合作伙伴”。

原始表：

id_orig	id_new
1	2
3	5
4	10
98	200
100	100
100	300

首先我需要找到一种方法，以更新所有对，其中ID_new引用完全一个ID_ORIG：

select ID_orig，id_orig，id_orig，id_orig，id_orig，id_new，id_new，从mytable comparetify计数（*）（iD_new分区）= 1

这将导致以下结果集（1）：

id_orig	id_new
1	2
3	5
4	10
100	300

问题是，当我现在查看“其余的”集合“（在这种情况下，这意味着每个ID，这不是第一个结果集的一部分），有新行，现在完全有一个“匹配合作伙伴”。

WITH cte as (
select id_orig, id_new
from mytable
QUALIFY COUNT(*) OVER (PARTITION BY id_new) = 1
) 
select id_orig, id_new
from mytable
where id_orig not in (select id_orig from cte)
and id_new not in (select id_new from cte)
QUALIFY COUNT(*) OVER (PARTITION BY id_new) = 1

结果集2：

ID_ORIG	ID_NEW
98	200

实际上，找到所有“匹配”合作伙伴还需要更多步骤，所以我的问题是：是否可以使用一个查询从结果集1和2中选择行？也许递归？

简要摘要我想实现的目标：

在集合中找到所有行，id_new是唯一的
（这些ID在源表中标记为“准备就绪”，下次将被忽略。）
与步骤1相同：找到所有集合中的行，id_new是唯一的（但是从第一个结果集忽略ID ..等等）。

由于我不知道我需要多少次来运行这些步骤，直到找到每一行，我想知道是否可以选择一个查询的每一行。

原文

I have a SQL, where I need to make assignments between ids, but only if there is exactling one "matching partner".

orig table:

id_orig	id_new
1	2
3	5
4	10
98	200
100	200
100	300

First I need to find a way, to update all the pairs, where an id_new references exactly one id_orig:

select id_orig, id_new from mytable QUALIFY COUNT(*) OVER (PARTITION BY id_new) = 1

this would result in the following resultset (1):

id_orig	id_new
1	2
3	5
4	10
100	300

The problem is, that when I now look at the "rest of the set" (which in this case means every id, that wasn't part of the first resultset), there are new rows, which now exactly have one "matching partner".

WITH cte as (
select id_orig, id_new
from mytable
QUALIFY COUNT(*) OVER (PARTITION BY id_new) = 1
) 
select id_orig, id_new
from mytable
where id_orig not in (select id_orig from cte)
and id_new not in (select id_new from cte)
QUALIFY COUNT(*) OVER (PARTITION BY id_new) = 1

resultset 2:

id_orig	id_new
98	200

In reality, there could be many more steps needed to find all the "matching" partners, so my question is: is it possible to select the rows from resultset 1 and 2 with one query? Maybe recursively?

A brief summary of what I want to achieve:

Find all rows in the set, where id_new is unique
( These ids are marked as "ready" in the source table, and will be ignored next time.)
Same as step 1: find all rows in the set, where id_new is unique (but ignore id's from the first resultset.. and so on).

Since I don't know how many times I would need to run these steps, until every row is found, I was wondering, if it was possible to select every row with one query.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

许久 2025-02-02 05:43:28

我会尝试采用以下方法：

WITH cte AS (
  SELECT id_orig, id_new
  FROM mytable
  QUALIFY COUNT(*) OVER (PARTITION BY id_new) = 1
),
final AS (
  SELECT id_orig, id_new
  FROM mytable
  LEFT JOIN cte 
    ON mytable.id_orig = cte.id_orig AND mytable.id_new = cte.id_new
  WHERE cte.id_orig IS NULL AND cte.id_new IS NULL
)
SELECT * FROM final

这应该返回第一个CTE与集合其余部分之间的非匹配字段中的mytable。

I'd try with the following approach:

WITH cte AS (
  SELECT id_orig, id_new
  FROM mytable
  QUALIFY COUNT(*) OVER (PARTITION BY id_new) = 1
),
final AS (
  SELECT id_orig, id_new
  FROM mytable
  LEFT JOIN cte 
    ON mytable.id_orig = cte.id_orig AND mytable.id_new = cte.id_new
  WHERE cte.id_orig IS NULL AND cte.id_new IS NULL
)
SELECT * FROM final

This should return the non-matching fields between the first CTE and the rest of the set in mytable.

回复收藏 0 原文

~没有更多了~