在多个字段上查找重复行
我使用此查询基于两个字段查找重复项:
SELECT
last_name,
first_name,
middle_initial,
COUNT(last_name) AS Duplicates,
IF(rec_id = '', 1, 0) AS has_REC_ID
FROM files
GROUP BY last_name, first_name
HAVING COUNT(last_name) > 1 AND COUNT(first_name) > 1;
好的,返回的是一组包含名字、姓氏和中间名的行,一个名为“Duplicates”的列,其中有很多 2,以及一个名为 has_REC_ID 的列,其中包含混合 1 和 0。
最终,我想做的是找到哪些行具有匹配的名字和姓氏 - 然后对于每一对,找到具有 ('') 作为 rec_id
值的行,从有rec_id的记录中分配rec_id值,然后删除第一个有rec_id的记录。
因此,对于初学者来说,我想创建一个新列并执行如下操作:
UPDATE files a
SET a.has_dup --new column
= if(a.last_name IN (
SELECT b.last_name
FROM files b
GROUP BY b.last_name
HAVING COUNT(b.last_name) > 1
)
, 1, null);
但 MySQL 返回:“您无法在 from 子句中指定目标表 'a' 进行更新”
我敢打赌,有比该方法更荒谬的事情我正在这里尝试。有人可以帮我弄清楚那是什么吗?
更新:我也尝试过:
UPDATE files a
SET a.has_dup = 1
WHERE a.last_name IN (
SELECT b.last_name
FROM files b
GROUP BY b.last_name
HAVING COUNT(b.last_name) > 1
);
...并收到相同的错误消息。
I am using this query to find duplicates based on two fields:
SELECT
last_name,
first_name,
middle_initial,
COUNT(last_name) AS Duplicates,
IF(rec_id = '', 1, 0) AS has_REC_ID
FROM files
GROUP BY last_name, first_name
HAVING COUNT(last_name) > 1 AND COUNT(first_name) > 1;
Okay, what this returns is a set of rows with first, last, and middle names, a column called 'Duplicates' with a lot of 2s, and a column called has_REC_ID with mixed 1s and 0s.
Ultimately, what I'm trying to do is find which rows have matching first and last names--and then for each of those pairs, find the one that has ('') as a value for rec_id
, assign the rec_id value from the one that DOES have a rec_id, and then delete the record that had a rec_id in the first place.
So for starters I though I would create a new column and do something like this:
UPDATE files a
SET a.has_dup --new column
= if(a.last_name IN (
SELECT b.last_name
FROM files b
GROUP BY b.last_name
HAVING COUNT(b.last_name) > 1
)
, 1, null);
But MySQL returns: "You can't specify target table 'a' for update in from clause"
I'll bet there's something much less ridiculous than the method I'm trying here. Can someone please help me figure out what that is?
UPDATE: I also tried:
UPDATE files a
SET a.has_dup = 1
WHERE a.last_name IN (
SELECT b.last_name
FROM files b
GROUP BY b.last_name
HAVING COUNT(b.last_name) > 1
);
...and got the same error message.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以:
1) 创建一个保留表
2) 使用具有匹配名字和姓氏且具有 rec_id != "" 的行填充保留表
3) 从原始表(文件)中删除具有匹配的第一个行和姓氏并且具有rec_id != ""
4) 更新原始表中具有匹配的名字和姓氏且具有rec_id = "" 的行。
5)放下固定桌
所以像这样:
You could:
1) Create a holding table
2) Populate the holding table with those rows that have a matching first and last name and have rec_id != ""
3) Delete the rows from the original table (files) that have a matching first and last name and have rec_id != ""
4) Update the rows in the original table that have a matching first and last name and have rec_id = "".
5) Drop the holding table
So something like:
来自文档:
我想不出一个快速的解决方法。
更新
显然, 有一个“快速”解决方法,但它是否有效是另一个问题。这都是通过引入临时表来添加新的间接层:
From the documentation:
I can't think of a quick workaround to that.
Update
Apparently, there is a "quick" workaround, but whether or not it's performant is another issue. It's all about adding a new layer of indirection by introducing a temporary table:
我没有任何 MySQL 可以测试,但是我认为这应该可以工作:(编辑->失败)
编辑:另一次尝试:
I don't have any MySQL to test, but this I think this should be work: (EDITED->FAIL)
EDITED: Another try: