SQL:如何根据两个字段查找重复项?
我在 Oracle 数据库表中有一些行,这些行对于两个字段的组合应该是唯一的,但表上没有设置唯一约束,因此我需要使用 SQL 自己查找违反约束的所有行。不幸的是,我微薄的 SQL 技能无法胜任这项任务。
我的表包含三列相关的列:entity_id、station_id 和 obs_year。对于每一行,station_id 和 obs_year 的组合应该是唯一的,我想通过使用 SQL 查询将它们清除来找出是否存在违反此规定的行。
我尝试过以下 SQL(由上一个问题建议),但它没有对我不起作用(我的 ORA-00918 列定义不明确):
SELECT
entity_id, station_id, obs_year
FROM
mytable t1
INNER JOIN (
SELECT entity_id, station_id, obs_year FROM mytable
GROUP BY entity_id, station_id, obs_year HAVING COUNT(*) > 1) dupes
ON
t1.station_id = dupes.station_id AND
t1.obs_year = dupes.obs_year
有人可以建议我做错了什么,和/或如何解决这个问题吗?
I have rows in an Oracle database table which should be unique for a combination of two fields but the unique constrain is not set up on the table so I need to find all rows which violate the constraint myself using SQL. Unfortunately my meager SQL skills aren't up to the task.
My table has three columns which are relevant: entity_id, station_id, and obs_year. For each row the combination of station_id and obs_year should be unique, and I want to find out if there are rows which violate this by flushing them out with an SQL query.
I have tried the following SQL (suggested by this previous question) but it doesn't work for me (I get ORA-00918 column ambiguously defined):
SELECT
entity_id, station_id, obs_year
FROM
mytable t1
INNER JOIN (
SELECT entity_id, station_id, obs_year FROM mytable
GROUP BY entity_id, station_id, obs_year HAVING COUNT(*) > 1) dupes
ON
t1.station_id = dupes.station_id AND
t1.obs_year = dupes.obs_year
Can someone suggest what I'm doing wrong, and/or how to solve this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
将初始选择中的 3 个字段更改为
Change the 3 fields in the initial select to be
重写您的查询
我认为不明确的列错误(ORA-00918)是因为您选择的列的名称同时出现在表和子查询中,但您没有指定是否需要它来自
dupes
或来自mytable
(别名为t1
)。Re-write of your query
I think the ambiguous column error (ORA-00918) was because you were
select
ing columns whose names appeared in both the table and the subquery, but you did not specifiy if you wanted it fromdupes
or frommytable
(aliased ast1
).我认为这里的很多解决方案都很麻烦且难以理解,因为我有 3 列主键约束并且需要查找重复项。所以这是一个选择
I thought a lot of the solutions here were cumbersome and tough to understand since I had a 3 column primary key constraint and needed to find the duplicates. So here's an option
您是否不能创建一个包含唯一约束的新表,然后逐行复制数据,忽略失败?
Could you not create a new table that includes the unique constraint, and then copy across the data row by row, ignoring failures?
您需要在主选择中指定列的表。另外,假设entity_id是mytable的唯一键并且与查找重复项无关,您不应该在dupes子查询中对其进行分组。
尝试:
You need to specify the table for the columns in the main select. Also, assuming entity_id is the unique key for mytable and is irrelevant to finding duplicates, you should not be grouping on it in the dupes subquery.
Try:
by Quassnoi 对于大型桌子来说是最有效的。
我对成本进行了这样的分析:
给出的成本为 1322341
给出的成本为 1271699
而
给出的成本为 1021984
该表未建立索引......
by Quassnoi is the most efficient for large tables.
I had this analysis of cost :
gave a cost of 1322341
gave a cost of 1271699
while
gave a cost of 1021984
The table was not indexed....
指定字段以在 SELECT 和 GROUP BY 上查找重复项。
它的工作原理是使用
GROUP BY
根据指定的列查找与任何其他行匹配的任何行。HAVING COUNT(*) > 1
表示我们只对查看出现次数超过 1 次的任何行感兴趣(因此是重复的)Specify the fields to find duplicates on both the SELECT and the GROUP BY.
It works by using
GROUP BY
to find any rows that match any other rows based on the specified Columns.The
HAVING COUNT(*) > 1
says that we are only interested in seeing any rows that occur more than 1 time (and are therefore duplicates)我很惊讶这里没有任何使用 CTE(通用表表达式)的答案,
这不仅会找到重复项(仅在名字和姓氏上),还会告诉您有多少个重复项。
I'm surprised there aren't any answers here that use a CTE (Common Table Expression)
Not only will this find duplicates (on first and last name only), it will tell you how many there are.