在比较同一表中的记录时如何不返回重复项（a：b和b：a）

发布于 2025-01-26 18:45:57 字数 1299 浏览 3 评论 0原文

我已经解决了这个问题已经有一段时间了，无法解决，这将非常感谢

我比较人表中的记录，以查看它们是否可能相同。为此，我使用的是使用语句来获取我需要的值并寻找匹配的

CREATE TABLE persons (
  serialno VARCHAR(20) NOT NULL,
  given VARCHAR(30) NOT NULL,
  family  VARCHAR(30) NOT NULL,
  dob  DATE NOT NULL,
  gender VARCHAR2(20 BYTE),
  address VARCHAR2(64 BYTE)
);

INSERT ALL 
    INTO persons ( serialno, given, family,dob,gender,address ) VALUES ( '001', 'Mick', 'Dundon','01/01/1970','Male','Main St' )
    INTO persons ( serialno, given, family, dob,gender,address) VALUES ( '002', 'Mick', 'Dundon','01/01/1970', 'Male','Montague St' )
    INTO persons ( serialno, given, family,dob,gender,address ) VALUES ( '003', 'Dave', 'Doyle', '13/10/1981','Male', 'Rathmines')
    INTO persons ( serialno, given, family,dob,gender,address ) VALUES ( '004', 'Jim', 'Morrison', '15/08/1956','Male','Newtown')
    INTO persons ( serialno, given, family, dob,gender,address) VALUES ( '005', 'Sam', 'Wise', '12/12/1992','Male','High St')
SELECT 1 FROM dual;

with rec as
(select serialno,given,family,dob,gender,address 
from persons)
select * 
from rec r1
join rec r2
on r1.given = r2.given
and r1.family = r2.family
and r1.gender = r2.gender
and r1.serialno <> r2.serialno

代码正常工作，除了我最终会重复，因为R1记录将在输出中进一步显示为R2，反之亦然。

有没有简单的方法可以避免这种重复？

原文

I have been stuck with this problem for a while now and can't resolve it, would greatly appreciate some guidance

I am comparing records in a persons table to see if they're possibly the same. To do this I am using a with statement to take the values I need and looking for matches

CREATE TABLE persons (
  serialno VARCHAR(20) NOT NULL,
  given VARCHAR(30) NOT NULL,
  family  VARCHAR(30) NOT NULL,
  dob  DATE NOT NULL,
  gender VARCHAR2(20 BYTE),
  address VARCHAR2(64 BYTE)
);

INSERT ALL 
    INTO persons ( serialno, given, family,dob,gender,address ) VALUES ( '001', 'Mick', 'Dundon','01/01/1970','Male','Main St' )
    INTO persons ( serialno, given, family, dob,gender,address) VALUES ( '002', 'Mick', 'Dundon','01/01/1970', 'Male','Montague St' )
    INTO persons ( serialno, given, family,dob,gender,address ) VALUES ( '003', 'Dave', 'Doyle', '13/10/1981','Male', 'Rathmines')
    INTO persons ( serialno, given, family,dob,gender,address ) VALUES ( '004', 'Jim', 'Morrison', '15/08/1956','Male','Newtown')
    INTO persons ( serialno, given, family, dob,gender,address) VALUES ( '005', 'Sam', 'Wise', '12/12/1992','Male','High St')
SELECT 1 FROM dual;

with rec as
(select serialno,given,family,dob,gender,address 
from persons)
select * 
from rec r1
join rec r2
on r1.given = r2.given
and r1.family = r2.family
and r1.gender = r2.gender
and r1.serialno <> r2.serialno

the code works fine except I end up with duplicates as the R1 record will appear further down in the output as R2, and vice versa.
Is there a simple way I can avoid this kind of duplication?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

命硬 2025-02-02 18:45:57

您可以通过使用分析Count函数来获取所有重复项：

SELECT serialno, given, family, dob, gender, address
FROM   (
  SELECT serialno, given, family, dob, gender, address,
         COUNT(*)  OVER (PARTITION BY given, family, gender) AS num_matches
  FROM   persons
)
WHERE num_matches > 1;

如果您还想将值与相同的给定/family /性别组合和最小序列号，然后，您可以通过使用分析功能避免自我加入：

SELECT serialno, given, family, dob, gender, address,
       min_serialno, min_dob, min_address
FROM   (
  SELECT serialno,
         given,
         family,
         dob,
         gender,
         address,
         MIN(serialno) OVER (PARTITION BY given, family, gender) AS min_serialno,
         MIN(dob) KEEP (DENSE_RANK FIRST ORDER BY serialno)
           OVER (PARTITION BY given, family, gender) AS min_dob,
         MIN(address) KEEP (DENSE_RANK FIRST ORDER BY serialno)
           OVER (PARTITION BY given, family, gender) AS min_address
  FROM   persons
)
WHERE serialno > min_serialno;

如果在Oracle中，您想获得所有可能的组合，则可以通过使用层次查询来避免自我加入：

SELECT serialno, given, family, dob, gender, address,
       PRIOR serialno AS p_serialno,
       PRIOR dob      AS p_dob,
       PRIOR address  AS p_address
FROM   persons
WHERE  LEVEL = 2
CONNECT BY
       PRIOR gender   = gender
AND    PRIOR given    = given
AND    PRIOR family   = family
AND    PRIOR serialno < serialno

db＆lt;

You can get all the duplicates without a self-join by using the analytic COUNT function:

SELECT serialno, given, family, dob, gender, address
FROM   (
  SELECT serialno, given, family, dob, gender, address,
         COUNT(*)  OVER (PARTITION BY given, family, gender) AS num_matches
  FROM   persons
)
WHERE num_matches > 1;

If you also want to compare the values to the row with the same given/family/gender combination and the minimum serial number then, again you can avoid a self-join by using analytic functions:

SELECT serialno, given, family, dob, gender, address,
       min_serialno, min_dob, min_address
FROM   (
  SELECT serialno,
         given,
         family,
         dob,
         gender,
         address,
         MIN(serialno) OVER (PARTITION BY given, family, gender) AS min_serialno,
         MIN(dob) KEEP (DENSE_RANK FIRST ORDER BY serialno)
           OVER (PARTITION BY given, family, gender) AS min_dob,
         MIN(address) KEEP (DENSE_RANK FIRST ORDER BY serialno)
           OVER (PARTITION BY given, family, gender) AS min_address
  FROM   persons
)
WHERE serialno > min_serialno;

If, in Oracle, you want to get all possible combinations then you can avoid a self-join by using a hierarchical query:

SELECT serialno, given, family, dob, gender, address,
       PRIOR serialno AS p_serialno,
       PRIOR dob      AS p_dob,
       PRIOR address  AS p_address
FROM   persons
WHERE  LEVEL = 2
CONNECT BY
       PRIOR gender   = gender
AND    PRIOR given    = given
AND    PRIOR family   = family
AND    PRIOR serialno < serialno

db<>fiddle here

回复收藏 0 原文

~没有更多了~