查找一个字段中具有重复项的行之间的差异
我即将使用以下方法从数据库中删除重复项
delete from table
where id not in (
select min(id)
from table
group by foreign_key);
:但是,我想在以下条件下执行此操作:
- 如果任何重复行在
fieldA
或fieldB
中具有值- 如果任何重复项具有
fieldA
或fieldB
值,但每个字段中只有一个唯一值,请保留该值 - 如果每一行中有多个唯一值,请报告此信息以及
id
和foreign_key
,以便可以手动修复该值。
- 如果任何重复项具有
默认情况下,fieldA
和 fieldB
为 NULL
,但在某些情况下已在这些字段中输入了数据。
这是一些示例数据:
| id | foreign_key | fieldA | fieldB |
|----+-------------+--------+--------|
| 1 | 1 | NULL | NULL |
| 2 | 1 | A1 | B1 |
| 3 | 1 | NULL | NULL |
| 4 | 2 | A2 | B2 |
| 5 | 2 | A3 | B2 |
| 6 | 3 | NULL | NULL |
| 7 | 4 | A4 | B4 |
| 8 | 5 | A5 | NULL |
| 9 | 5 | NULL | B5 |
| 10 | 6 | A6 | B6 |
| 11 | 6 | A7 | B6 |
| 12 | 7 | NULL | B7 |
| 13 | 7 | NULL | B7 |
我想要保留的是:
| id | foreign_key | fieldA | fieldB |
|----+-------------+--------+--------|
| 2 | 1 | A1 | B1 |
| 4 | 2 | NULL | B2 |
| 6 | 3 | NULL | NULL |
| 7 | 4 | A4 | B4 |
| 8 | 5 | A5 | B5 |
| 10 | 6 | NULL | B6 |
| 12 | 7 | NULL | B7 |
我希望返回此信息:
foreign_key 2 has two distinct values of fieldA: A2 and A3
I am about to remove duplicates from my database using
delete from table
where id not in (
select min(id)
from table
group by foreign_key);
However, I would like to do so with the following conditions:
- if any of the duplicate rows have a value in
fieldA
orfieldB
- if any of the duplicates have a value for
fieldA
orfieldB
, but there is only one unique value in each field, keep that value - if there is more than one unique value in each row, report this information along with the
id
andforeign_key
so that the value can be fixed manually.
- if any of the duplicates have a value for
By default, fieldA
and fieldB
are NULL
, but data have been entered in these fields in some cases.
Here is some sample data:
| id | foreign_key | fieldA | fieldB |
|----+-------------+--------+--------|
| 1 | 1 | NULL | NULL |
| 2 | 1 | A1 | B1 |
| 3 | 1 | NULL | NULL |
| 4 | 2 | A2 | B2 |
| 5 | 2 | A3 | B2 |
| 6 | 3 | NULL | NULL |
| 7 | 4 | A4 | B4 |
| 8 | 5 | A5 | NULL |
| 9 | 5 | NULL | B5 |
| 10 | 6 | A6 | B6 |
| 11 | 6 | A7 | B6 |
| 12 | 7 | NULL | B7 |
| 13 | 7 | NULL | B7 |
What I want to keep is:
| id | foreign_key | fieldA | fieldB |
|----+-------------+--------+--------|
| 2 | 1 | A1 | B1 |
| 4 | 2 | NULL | B2 |
| 6 | 3 | NULL | NULL |
| 7 | 4 | A4 | B4 |
| 8 | 5 | A5 | B5 |
| 10 | 6 | NULL | B6 |
| 12 | 7 | NULL | B7 |
And I would like this information to be returned:
foreign_key 2 has two distinct values of fieldA: A2 and A3
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我现在必须运行,但这里有一个查询:
在测试数据上,返回:
查询要保留的行:
查询需要操作员干预的行:
GROUP_CONCAT
可能不合适,具体取决于列中存储的数据格式。然而,结合 #A 和 #B,您可以检测到它何时不合适,所以这应该不是一个大问题。它也可能对性能产生太大影响,但我想不出可以以相同方式使用的另一个聚合函数(GROUP_COALESCE
会很好)。I've got to run of right now, but here's a query to start with:
On the test data, this returns:
Query for rows to keep:
Query for rows that need operator intervention:
GROUP_CONCAT
may not be suitable, depending on the format of data stored in the columns. In combination with #A and #B, however, you could detect when it's not suitable, so it shouldn't be a big problem. It may also have too big a performance impact, but I can't think of another aggregate function that could be used in the same way (aGROUP_COALESCE
would be nice).