递归SQL以消除重复项?

发布于 2025-01-26 05:43:28 字数 1668 浏览 2 评论 0原文

我有一个SQL,我需要在ID之间进行作业,但前提是确切的“匹配合作伙伴”。

原始表:

id_origid_new
12
35
410
98200
100100
100300

首先我需要找到一种方法,以更新所有对,其中ID_new引用完全一个ID_ORIG:

select ID_orig,id_orig,id_orig,id_orig,id_orig,id_new,id_new,从mytable comparetify计数(*)(iD_new分区)= 1

这将导致以下结果集(1):

id_origid_new
12
35
410
100300

问题是,当我现在查看“其余的”集合“(在这种情况下,这意味着每个ID,这不是第一个结果集的一部分),有新行,现在完全有一个“匹配合作伙伴”。

WITH cte as (
select id_orig, id_new
from mytable
QUALIFY COUNT(*) OVER (PARTITION BY id_new) = 1
) 
select id_orig, id_new
from mytable
where id_orig not in (select id_orig from cte)
and id_new not in (select id_new from cte)
QUALIFY COUNT(*) OVER (PARTITION BY id_new) = 1

结果集2:

ID_ORIGID_NEW
98200

实际上,找到所有“匹配”合作伙伴还需要更多步骤,所以我的问题是:是否可以使用一个查询从结果集1和2中选择行?也许递归?

简要摘要我想实现的目标:

  1. 在集合中找到所有行,id_new是唯一的
  2. (这些ID在源表中标记为“准备就绪”,下次将被忽略。)
  3. 与步骤1相同:找到所有集合中的行,id_new是唯一的(但是从第一个结果集忽略ID ..等等)。

由于我不知道我需要多少次来运行这些步骤,直到找到每一行,我想知道是否可以选择一个查询的每一行。

I have a SQL, where I need to make assignments between ids, but only if there is exactling one "matching partner".

orig table:

id_origid_new
12
35
410
98200
100200
100300

First I need to find a way, to update all the pairs, where an id_new references exactly one id_orig:

select id_orig, id_new from mytable QUALIFY COUNT(*) OVER (PARTITION BY id_new) = 1

this would result in the following resultset (1):

id_origid_new
12
35
410
100300

The problem is, that when I now look at the "rest of the set" (which in this case means every id, that wasn't part of the first resultset), there are new rows, which now exactly have one "matching partner".

WITH cte as (
select id_orig, id_new
from mytable
QUALIFY COUNT(*) OVER (PARTITION BY id_new) = 1
) 
select id_orig, id_new
from mytable
where id_orig not in (select id_orig from cte)
and id_new not in (select id_new from cte)
QUALIFY COUNT(*) OVER (PARTITION BY id_new) = 1

resultset 2:

id_origid_new
98200

In reality, there could be many more steps needed to find all the "matching" partners, so my question is: is it possible to select the rows from resultset 1 and 2 with one query? Maybe recursively?

A brief summary of what I want to achieve:

  1. Find all rows in the set, where id_new is unique
  2. ( These ids are marked as "ready" in the source table, and will be ignored next time.)
  3. Same as step 1: find all rows in the set, where id_new is unique (but ignore id's from the first resultset.. and so on).

Since I don't know how many times I would need to run these steps, until every row is found, I was wondering, if it was possible to select every row with one query.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

许久 2025-02-02 05:43:28

我会尝试采用以下方法:

WITH cte AS (
  SELECT id_orig, id_new
  FROM mytable
  QUALIFY COUNT(*) OVER (PARTITION BY id_new) = 1
),
final AS (
  SELECT id_orig, id_new
  FROM mytable
  LEFT JOIN cte 
    ON mytable.id_orig = cte.id_orig AND mytable.id_new = cte.id_new
  WHERE cte.id_orig IS NULL AND cte.id_new IS NULL
)
SELECT * FROM final

这应该返回第一个CTE与集合其余部分之间的非匹配字段中的mytable

I'd try with the following approach:

WITH cte AS (
  SELECT id_orig, id_new
  FROM mytable
  QUALIFY COUNT(*) OVER (PARTITION BY id_new) = 1
),
final AS (
  SELECT id_orig, id_new
  FROM mytable
  LEFT JOIN cte 
    ON mytable.id_orig = cte.id_orig AND mytable.id_new = cte.id_new
  WHERE cte.id_orig IS NULL AND cte.id_new IS NULL
)
SELECT * FROM final

This should return the non-matching fields between the first CTE and the rest of the set in mytable.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文