Oracle：识别没有索引的表中的重复项

发布于 2024-08-26 07:29:45 字数 272 浏览 10 评论 0原文

当我尝试在大型表上创建唯一索引时，出现唯一约束错误。本例中的唯一索引是 4 列的复合键。

除了以下之外，是否有一种有效的方法来识别重复项：

select col1, col2, col3, col4, count(*)
from Table1
group by col1, col2, col3, col4
having count(*) > 1

上面的解释计划显示全表扫描的成本极高，只是想看看是否还有其他方法。

谢谢！

原文

When I try to create a unique index on a large table, I get a unique contraint error. The unique index in this case is a composite key of 4 columns.

Is there an efficient way to identify the duplicates other than :

select col1, col2, col3, col4, count(*)
from Table1
group by col1, col2, col3, col4
having count(*) > 1

The explain plan above shows full table scan with extremely high cost, and just want to find if there is another way.

Thanks !

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

长途伴 2024-09-02 07:29:45

首先尝试在这四列上创建非唯一索引。这将花费 O(n log n) 时间，但也会将执行 select 所需的时间减少到 O(n log n)。

你在这里有点困难——无论你以何种方式对其进行切片，整个表都必须至少读入一次。朴素算法的运行时间为 O(n²)，除非查询优化器足够聪明来构建临时索引/表。

回复收藏 0 原文

夜夜流光相皎洁 2024-09-02 07:29:45

您可以使用 EXCEPTIONS INTO 子句来捕获重复的行。

如果您还没有 EXCEPTIONS 表，请使用提供的脚本创建一个：

SQL>  @$ORACLE_HOME/rdbms/admin/ultexcpt.sql

现在您可以尝试创建这样的唯一约束

alter table Table1
add  constraint tab1_uq UNIQUE (col1, col2, col3, col4)
exceptions into exceptions
/

这将失败，但现在您的 EXCEPTIONS 表包含其键包含重复项的所有行的列表，由ROWID。这为您决定如何处理重复项（删除、重新编号等）提供了基础。

编辑

正如其他人所指出的，您必须支付扫描一次表格的费用。这种方法为您提供了一组永久的重复行，而 ROWID 是访问任何给定行的最快方法。

You can use the EXCEPTIONS INTO clause to trap the duplicated rows.

If you don't already have an EXCEPTIONS table create one using the provided script:

SQL>  @$ORACLE_HOME/rdbms/admin/ultexcpt.sql

Now you can attempt to create a unique constraint like this

alter table Table1
add  constraint tab1_uq UNIQUE (col1, col2, col3, col4)
exceptions into exceptions
/

This will fail but now your EXCEPTIONS table contains a list of all the rows whose keys contain duplicates, identified by ROWID. That gives you a basis for deciding what to do with the duplicates (delete, renumber, whatever).

edit

As others have noted you have to pay the cost of scanning the table once. This approach gives you a permanent set of the duplicated rows, and ROWID is the fastest way of accessing any given row.

回复收藏 0 原文