查找一个字段中具有重复项的行之间的差异

发布于 2024-10-18 17:33:13 字数 1665 浏览 4 评论 0原文

我即将使用以下方法从数据库中删除重复项

delete from table 
  where id not in (
    select min(id) 
      from table 
      group by foreign_key);

：但是，我想在以下条件下执行此操作：

如果任何重复行在 fieldA 或 fieldB 中具有值
- 如果任何重复项具有 fieldA 或 fieldB 值，但每个字段中只有一个唯一值，请保留该值
- 如果每一行中有多个唯一值，请报告此信息以及 id 和 foreign_key，以便可以手动修复该值。

默认情况下，fieldA 和 fieldB 为 NULL，但在某些情况下已在这些字段中输入了数据。

这是一些示例数据：

| id | foreign_key | fieldA | fieldB |
|----+-------------+--------+--------|
|  1 |           1 | NULL   | NULL   |
|  2 |           1 | A1     | B1     |
|  3 |           1 | NULL   | NULL   |
|  4 |           2 | A2     | B2     |
|  5 |           2 | A3     | B2     |
|  6 |           3 | NULL   | NULL   |
|  7 |           4 | A4     | B4     |
|  8 |           5 | A5     | NULL   |
|  9 |           5 | NULL   | B5     |
| 10 |           6 | A6     | B6     |
| 11 |           6 | A7     | B6     |
| 12 |           7 | NULL   | B7     |
| 13 |           7 | NULL   | B7     |

我想要保留的是：

| id | foreign_key | fieldA | fieldB |
|----+-------------+--------+--------|
|  2 |           1 | A1     | B1     |
|  4 |           2 | NULL   | B2     |
|  6 |           3 | NULL   | NULL   |
|  7 |           4 | A4     | B4     |
|  8 |           5 | A5     | B5     |
| 10 |           6 | NULL   | B6     |
| 12 |           7 | NULL   | B7     |

我希望返回此信息：

foreign_key 2 has two distinct values of fieldA: A2 and A3

原文

I am about to remove duplicates from my database using

delete from table 
  where id not in (
    select min(id) 
      from table 
      group by foreign_key);

However, I would like to do so with the following conditions:

if any of the duplicate rows have a value in fieldA or fieldB
- if any of the duplicates have a value for fieldA or fieldB, but there is only one unique value in each field, keep that value
- if there is more than one unique value in each row, report this information along with the id and foreign_key so that the value can be fixed manually.

By default, fieldA and fieldB are NULL, but data have been entered in these fields in some cases.

Here is some sample data:

| id | foreign_key | fieldA | fieldB |
|----+-------------+--------+--------|
|  1 |           1 | NULL   | NULL   |
|  2 |           1 | A1     | B1     |
|  3 |           1 | NULL   | NULL   |
|  4 |           2 | A2     | B2     |
|  5 |           2 | A3     | B2     |
|  6 |           3 | NULL   | NULL   |
|  7 |           4 | A4     | B4     |
|  8 |           5 | A5     | NULL   |
|  9 |           5 | NULL   | B5     |
| 10 |           6 | A6     | B6     |
| 11 |           6 | A7     | B6     |
| 12 |           7 | NULL   | B7     |
| 13 |           7 | NULL   | B7     |

What I want to keep is:

| id | foreign_key | fieldA | fieldB |
|----+-------------+--------+--------|
|  2 |           1 | A1     | B1     |
|  4 |           2 | NULL   | B2     |
|  6 |           3 | NULL   | NULL   |
|  7 |           4 | A4     | B4     |
|  8 |           5 | A5     | B5     |
| 10 |           6 | NULL   | B6     |
| 12 |           7 | NULL   | B7     |

And I would like this information to be returned:

foreign_key 2 has two distinct values of fieldA: A2 and A3

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

独享拥抱 2024-10-25 17:33:13

我现在必须运行，但这里有一个查询：

SELECT id, foreign_key, 
    group_concat(DISTINCT fieldA) as A, count(DISTINCT fieldA) as `#A`,
    group_concat(DISTINCT fieldB) as B, count(DISTINCT fieldB) as `#B`
  FROM t1
  GROUP BY foreign_key
;

在测试数据上，返回：

|编号 |外键 |一个 | #A |乙| #B |
+----+-------------+--------+----+------+----+
| 1 | 1 | A1 | 1 | B1 | 1 |
| 4 | 2 | A2,A3 | 2 | B2 | 1 |
| 6 | 3 |空 | 0 |空| 0 |
| 7 | 4 | A4| 1 | B4 | 1 |
| 8 | 5 | A5| 1 | B5| 1 |
| 10 | 10 6 | A6,A7 | 2 | B6 | 1 |
| 12 | 12 7 |空| 0 | B7 | 1 |

查询要保留的行：

SELECT id, foreign_key, 
    group_concat(DISTINCT fieldA) as A, count(DISTINCT fieldA) as `#A`, 
    group_concat(DISTINCT fieldB) as B, count(DISTINCT fieldB) as `#B`
  FROM t1
  GROUP BY foreign_key
  HAVING `#A` < 2 AND `#B` < 2
;

查询需要操作员干预的行：

SELECT id, foreign_key, 
    group_concat(DISTINCT fieldA) as A, count(DISTINCT fieldA) as `#A`, 
    group_concat(DISTINCT fieldB) as B, count(DISTINCT fieldB) as `#B`
  FROM t1
  GROUP BY foreign_key
  HAVING `#A` >= 2 OR `#B` >= 2
;

GROUP_CONCAT 可能不合适，具体取决于列中存储的数据格式。然而，结合 #A 和 #B，您可以检测到它何时不合适，所以这应该不是一个大问题。它也可能对性能产生太大影响，但我想不出可以以相同方式使用的另一个聚合函数（GROUP_COALESCE 会很好）。

I've got to run of right now, but here's a query to start with:

SELECT id, foreign_key, 
    group_concat(DISTINCT fieldA) as A, count(DISTINCT fieldA) as `#A`,
    group_concat(DISTINCT fieldB) as B, count(DISTINCT fieldB) as `#B`
  FROM t1
  GROUP BY foreign_key
;

On the test data, this returns:

| id | foreign_key | A     | #A | B    | #B |
+----+-------------+-------+----+------+----+
|  1 |           1 | A1    |  1 | B1   |  1 |
|  4 |           2 | A2,A3 |  2 | B2   |  1 |
|  6 |           3 | NULL  |  0 | NULL |  0 |
|  7 |           4 | A4    |  1 | B4   |  1 |
|  8 |           5 | A5    |  1 | B5   |  1 |
| 10 |           6 | A6,A7 |  2 | B6   |  1 |
| 12 |           7 | NULL  |  0 | B7   |  1 |

Query for rows to keep:

SELECT id, foreign_key, 
    group_concat(DISTINCT fieldA) as A, count(DISTINCT fieldA) as `#A`, 
    group_concat(DISTINCT fieldB) as B, count(DISTINCT fieldB) as `#B`
  FROM t1
  GROUP BY foreign_key
  HAVING `#A` < 2 AND `#B` < 2
;

Query for rows that need operator intervention:

SELECT id, foreign_key, 
    group_concat(DISTINCT fieldA) as A, count(DISTINCT fieldA) as `#A`, 
    group_concat(DISTINCT fieldB) as B, count(DISTINCT fieldB) as `#B`
  FROM t1
  GROUP BY foreign_key
  HAVING `#A` >= 2 OR `#B` >= 2
;

GROUP_CONCAT may not be suitable, depending on the format of data stored in the columns. In combination with #A and #B, however, you could detect when it's not suitable, so it shouldn't be a big problem. It may also have too big a performance impact, but I can't think of another aggregate function that could be used in the same way (a GROUP_COALESCE would be nice).

回复收藏 0 原文

~没有更多了~