在多个字段上查找重复行

发布于 2024-12-07 04:41:41 字数 1302 浏览 0 评论 0原文

我使用此查询基于两个字段查找重复项：

SELECT 
    last_name, 
    first_name,
    middle_initial,
    COUNT(last_name) AS Duplicates,
    IF(rec_id = '', 1, 0) AS has_REC_ID 
FROM files
GROUP BY last_name, first_name
HAVING COUNT(last_name) > 1 AND COUNT(first_name) > 1;

好的，返回的是一组包含名字、姓氏和中间名的行，一个名为“Duplicates”的列，其中有很多 2，以及一个名为 has_REC_ID 的列，其中包含混合 1 和 0。

最终，我想做的是找到哪些行具有匹配的名字和姓氏 - 然后对于每一对，找到具有 ('') 作为 rec_id 值的行，从有rec_id的记录中分配rec_id值，然后删除第一个有rec_id的记录。

因此，对于初学者来说，我想创建一个新列并执行如下操作：

UPDATE files a 
SET a.has_dup    --new column
    = if(a.last_name IN (
                         SELECT b.last_name
                         FROM files b
                         GROUP BY b.last_name 
                         HAVING COUNT(b.last_name) > 1
                        )
      , 1, null);

但 MySQL 返回：“您无法在 from 子句中指定目标表 'a' 进行更新”

我敢打赌，有比该方法更荒谬的事情我正在这里尝试。有人可以帮我弄清楚那是什么吗？

更新：我也尝试过：

UPDATE files a 
SET a.has_dup = 1
WHERE a.last_name IN (
                         SELECT b.last_name
                         FROM files b
                         GROUP BY b.last_name 
                         HAVING COUNT(b.last_name) > 1
                     );

...并收到相同的错误消息。

原文

I am using this query to find duplicates based on two fields:

SELECT 
    last_name, 
    first_name,
    middle_initial,
    COUNT(last_name) AS Duplicates,
    IF(rec_id = '', 1, 0) AS has_REC_ID 
FROM files
GROUP BY last_name, first_name
HAVING COUNT(last_name) > 1 AND COUNT(first_name) > 1;

Okay, what this returns is a set of rows with first, last, and middle names, a column called 'Duplicates' with a lot of 2s, and a column called has_REC_ID with mixed 1s and 0s.

Ultimately, what I'm trying to do is find which rows have matching first and last names--and then for each of those pairs, find the one that has ('') as a value for rec_id, assign the rec_id value from the one that DOES have a rec_id, and then delete the record that had a rec_id in the first place.

So for starters I though I would create a new column and do something like this:

UPDATE files a 
SET a.has_dup    --new column
    = if(a.last_name IN (
                         SELECT b.last_name
                         FROM files b
                         GROUP BY b.last_name 
                         HAVING COUNT(b.last_name) > 1
                        )
      , 1, null);

But MySQL returns: "You can't specify target table 'a' for update in from clause"

I'll bet there's something much less ridiculous than the method I'm trying here. Can someone please help me figure out what that is?

UPDATE: I also tried:

UPDATE files a 
SET a.has_dup = 1
WHERE a.last_name IN (
                         SELECT b.last_name
                         FROM files b
                         GROUP BY b.last_name 
                         HAVING COUNT(b.last_name) > 1
                     );

...and got the same error message.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

不顾 2024-12-14 04:41:41

您可以：

1) 创建一个保留表

2) 使用具有匹配名字和姓氏且具有 rec_id != "" 的行填充保留表

3) 从原始表（文件）中删除具有匹配的第一个行和姓氏并且具有rec_id != ""

4) 更新原始表中具有匹配的名字和姓氏且具有rec_id = "" 的行。

5）放下固定桌

所以像这样：

create table temp
(
firstname varchar(100) not null,
lastname varchar(100) not null,
rec_id int not null
);


insert into temp (select firstname,lastname,rec_id from files where firstname =    lastname and rec_id != '');


delete from files where firstname = lastname and rec_id != '';

update files f
set f.rec_id = (select t.rec_id from temp t where f.firstname = t.firstname and f.lastname = t.lastname)
where f.firstname = f.lastname 
and f.rec_id != '';


drop table temp;

You could:

1) Create a holding table

2) Populate the holding table with those rows that have a matching first and last name and have rec_id != ""

3) Delete the rows from the original table (files) that have a matching first and last name and have rec_id != ""

4) Update the rows in the original table that have a matching first and last name and have rec_id = "".

5) Drop the holding table

So something like:

create table temp
(
firstname varchar(100) not null,
lastname varchar(100) not null,
rec_id int not null
);


insert into temp (select firstname,lastname,rec_id from files where firstname =    lastname and rec_id != '');


delete from files where firstname = lastname and rec_id != '';

update files f
set f.rec_id = (select t.rec_id from temp t where f.firstname = t.firstname and f.lastname = t.lastname)
where f.firstname = f.lastname 
and f.rec_id != '';


drop table temp;

回复收藏 0 原文

长不大的小祸害 2024-12-14 04:41:41

来自文档：

目前，您无法更新表并在子查询中从同一个表中进行选择。

我想不出一个快速的解决方法。

更新

显然，有一个“快速”解决方法，但它是否有效是另一个问题。这都是通过引入临时表来添加新的间接层：

UPDATE files a 
SET a.has_dup    --new column
    = if(a.last_name IN (
                     SELECT b.last_name
                     FROM
                          (SELECT * FROM files)      -- new table target
                     b
                     GROUP BY b.last_name 
                     HAVING COUNT(b.last_name) > 1
                    ),
      1, null);

From the documentation:

Currently, you cannot update a table and select from the same table in a subquery.

I can't think of a quick workaround to that.

Update

Apparently, there is a "quick" workaround, but whether or not it's performant is another issue. It's all about adding a new layer of indirection by introducing a temporary table:

UPDATE files a 
SET a.has_dup    --new column
    = if(a.last_name IN (
                     SELECT b.last_name
                     FROM
                          (SELECT * FROM files)      -- new table target
                     b
                     GROUP BY b.last_name 
                     HAVING COUNT(b.last_name) > 1
                    ),
      1, null);

回复收藏 0 原文

疑心病 2024-12-14 04:41:41

我没有任何 MySQL 可以测试，但是我认为这应该可以工作：（编辑->失败）

UPDATE files
SET has_dup
    = if(last_name IN (
                         SELECT b.last_name
                         FROM files b
                         GROUP BY b.last_name 
                         HAVING COUNT(b.last_name) > 1
                      )
      , 1, null);

编辑：另一次尝试：

UPDATE files f, (SELECT b.last_name
                   FROM files b
               GROUP BY b.last_name 
                 HAVING COUNT(b.last_name) > 1
                ) as duplicates
   SET f.has_dup = 1
 WHERE f.last_name = duplicates.last_name

I don't have any MySQL to test, but this I think this should be work: (EDITED->FAIL)

UPDATE files
SET has_dup
    = if(last_name IN (
                         SELECT b.last_name
                         FROM files b
                         GROUP BY b.last_name 
                         HAVING COUNT(b.last_name) > 1
                      )
      , 1, null);

EDITED: Another try:

UPDATE files f, (SELECT b.last_name
                   FROM files b
               GROUP BY b.last_name 
                 HAVING COUNT(b.last_name) > 1
                ) as duplicates
   SET f.has_dup = 1
 WHERE f.last_name = duplicates.last_name

回复收藏 0 原文

~没有更多了~