Oracle中的两个删除语句以删除重复项

发布于 2025-01-23 06:36:36 字数 539 浏览 2 评论 0 原文

我们有一张超过55k行的表,重复了一个识别名称。该名称可能会有所不同,每个名称的重复项数量可能会有所不同。因此,我将这两个脚本应用于练习中,从表中删除重复记录。有区别吗?脚本有任何问题吗?输出似乎是一样的。

DELETE FROM RDSUSER.A_JOB 
WHERE ROWID IN (SELECT ROWID 
                FROM (SELECT ROWID
                     , ROW_NUMBER() OVER(PARTITION BY JOB_NAME ORDER BY JOB_NAME) DUP 
                      FROM RDSUSER.A_JOB) 
                WHERE DUP > 1);

DELETE FROM RDSUSER.A_JOB JOB
WHERE ROWID > (SELECT MIN(ROWID) 
               FROM A_JOB
               WHERE JOB.JOB_NAME = A_JOB.JOB_NAME);

We have a table with over 55k rows that have an identifying name duplicated. The name can vary and the number of duplicates with each name can vary. So I applied these 2 scripts for practice deleting duplicate records from a table. Is there a difference? Are there any issues with the scripts? The output seemed the same.

DELETE FROM RDSUSER.A_JOB 
WHERE ROWID IN (SELECT ROWID 
                FROM (SELECT ROWID
                     , ROW_NUMBER() OVER(PARTITION BY JOB_NAME ORDER BY JOB_NAME) DUP 
                      FROM RDSUSER.A_JOB) 
                WHERE DUP > 1);

DELETE FROM RDSUSER.A_JOB JOB
WHERE ROWID > (SELECT MIN(ROWID) 
               FROM A_JOB
               WHERE JOB.JOB_NAME = A_JOB.JOB_NAME);

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

故人爱我别走 2025-01-30 06:36:36

有区别吗?

是的。

DELETE FROM RDSUSER.A_JOB 
WHERE ROWID IN (SELECT ROWID 
                FROM (SELECT ROWID
                     , ROW_NUMBER() OVER(PARTITION BY JOB_NAME ORDER BY JOB_NAME) DUP 
                      FROM RDSUSER.A_JOB) 
                WHERE DUP > 1);

Will job_name 通过job_name 订购 ,并且由于分区中的所有 job_name 将与 条款是非确定性的,分区中的行将有效地随机编号,并且不能保证将分区中的行保留/删除。

这意味着,如果您运行查询,然后回滚再次运行查询,则第二次已删除的行可能会不同(例如,如果您在并行系统上运行查询)。

DELETE FROM RDSUSER.A_JOB JOB
WHERE ROWID > (SELECT MIN(ROWID) 
               FROM A_JOB
               WHERE JOB.JOB_NAME = A_JOB.JOB_NAME);

将始终保留每个 job_name 的最小 rowID ,并且要保留/删除的行选择将是确定性的。

这意味着,如果您运行查询,然后回滚更改并第二次运行删除,则将删除一组相同的行。


如果要查询相同的函数,则可以使用:

DELETE FROM RDSUSER.A_JOB 
WHERE ROWID IN (SELECT ROWID 
                FROM (
                  SELECT ROW_NUMBER() OVER(PARTITION BY JOB_NAME ORDER BY ROWID)
                           AS DUP 
                  FROM   RDSUSER.A_JOB
                ) 
                WHERE DUP > 1);

随机排序的一个示例是:

CREATE TABLE a_job (
  id       NUMBER(5,0) GENERATED ALWAYS AS IDENTITY,
  job_name VARCHAR2(20)
);

INSERT INTO a_job (job_name)
SELECT 'a' FROM DUAL CONNECT BY LEVEL <= 3 UNION ALL
SELECT 'b' FROM DUAL CONNECT BY LEVEL <= 2 UNION ALL
SELECT 'c' FROM DUAL CONNECT BY LEVEL <= 5;

INSERT INTO a_job (job_name)
SELECT 'a' FROM DUAL CONNECT BY LEVEL <= 3;

后面:

DELETE FROM /*RDSUSER.*/A_JOB 
WHERE ROWID IN (SELECT ROWID 
                FROM (SELECT ROWID
                     , ROW_NUMBER() OVER(PARTITION BY JOB_NAME ORDER BY JOB_NAME) DUP 
                      FROM /*RDSUSER.*/A_JOB) 
                WHERE DUP > 1);

表可以包含:

id job_name
4 b
10 c
12 a

,但如果您回滚,然后:

DELETE FROM /*RDSUSER.*/A_JOB JOB
WHERE ROWID > (SELECT MIN(ROWID) 
               FROM A_JOB
               WHERE JOB.JOB_NAME = A_JOB.JOB_NAME);

然后输出可能是:

id job_name
1 a
4 b
6 c

db&lt;&gt;&gt; fiddle

Is there a difference?

Yes.

DELETE FROM RDSUSER.A_JOB 
WHERE ROWID IN (SELECT ROWID 
                FROM (SELECT ROWID
                     , ROW_NUMBER() OVER(PARTITION BY JOB_NAME ORDER BY JOB_NAME) DUP 
                      FROM RDSUSER.A_JOB) 
                WHERE DUP > 1);

Will PARTITION BY JOB_NAME and then ORDER BY JOB_NAME and since all the JOB_NAME within the partition are going to be identical then the ORDER BY clause is non-deterministic and the rows within the partition will be given an effectively random numbering for the rows and it is not guaranteed which rows in the partition will be kept/deleted.

This means that if you run the query and then ROLLBACK and run the query again then the set of deleted rows may be different the second time (for example, if you run the query on a parallel system).

DELETE FROM RDSUSER.A_JOB JOB
WHERE ROWID > (SELECT MIN(ROWID) 
               FROM A_JOB
               WHERE JOB.JOB_NAME = A_JOB.JOB_NAME);

Will always keep the minimum ROWID for each JOB_NAME and the selection of rows to be kept/deleted will be deterministic.

This means that if you run the query and then ROLLBACK the change and run the delete a second time then an identical set of rows will be deleted.


If you want the queries to function identically then you can use:

DELETE FROM RDSUSER.A_JOB 
WHERE ROWID IN (SELECT ROWID 
                FROM (
                  SELECT ROW_NUMBER() OVER(PARTITION BY JOB_NAME ORDER BY ROWID)
                           AS DUP 
                  FROM   RDSUSER.A_JOB
                ) 
                WHERE DUP > 1);

An example of the random ordering is:

CREATE TABLE a_job (
  id       NUMBER(5,0) GENERATED ALWAYS AS IDENTITY,
  job_name VARCHAR2(20)
);

INSERT INTO a_job (job_name)
SELECT 'a' FROM DUAL CONNECT BY LEVEL <= 3 UNION ALL
SELECT 'b' FROM DUAL CONNECT BY LEVEL <= 2 UNION ALL
SELECT 'c' FROM DUAL CONNECT BY LEVEL <= 5;

INSERT INTO a_job (job_name)
SELECT 'a' FROM DUAL CONNECT BY LEVEL <= 3;

Then after:

DELETE FROM /*RDSUSER.*/A_JOB 
WHERE ROWID IN (SELECT ROWID 
                FROM (SELECT ROWID
                     , ROW_NUMBER() OVER(PARTITION BY JOB_NAME ORDER BY JOB_NAME) DUP 
                      FROM /*RDSUSER.*/A_JOB) 
                WHERE DUP > 1);

The table may contain:

ID JOB_NAME
4 b
10 c
12 a

But if you ROLLBACK and then:

DELETE FROM /*RDSUSER.*/A_JOB JOB
WHERE ROWID > (SELECT MIN(ROWID) 
               FROM A_JOB
               WHERE JOB.JOB_NAME = A_JOB.JOB_NAME);

Then the output may be:

ID JOB_NAME
1 a
4 b
6 c

db<>fiddle here

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文