查找具有重复/相似列值的行 MySQL

发布于 2024-10-16 14:33:07 字数 1309 浏览 0 评论 0原文

我想从下表中选择 fname 列中具有相似值的所有行作为其顺序中的第一行。 IOW 从此表中我想检索 id 为 2,5 和 7 的行(因为“ anna”位于“anna”和“michaela”之后>”和“michaal”位于“michael”之后)。

+----+------------+----------+
| id | fname      | lname    |
+----+------------+----------+
|  1 | anna       | milski   |
|  2 |  anna      | nguyen   |
|  3 | michael    | michaels |
|  4 | james      | bond     |
|  5 | michaela   | king     |
|  6 | bruce      | smart    |
|  7 | michaal    | hardy    |
+----+------------+----------+

到目前为止我所拥有的是这样的:

select *, count(fname) cnt 
from users group by soundex(fname) 
having count(soundex(fname)) > 1;

但由于我将其分组,所以结果是

+----+----------+----------+-----+
| id | fname    | lname    | cnt |
+----+----------+----------+-----+
|  1 | anna     | milski   |   2 |
|  3 | michael  | michaels |   3 |
+----+----------+----------+-----+

我想要检索的是这样的:

+----+----------+----------+-----+
| id | fname    | lname    | cnt |
+----+----------+----------+-----+
|  2 |  anna    | nyugen   |   2 |
|  5 | michaela | king     |   3 |
|  7 | michaal  | hardy    |   3 |
+----+----------+----------+-----+

我应该对查询进行哪些更改?我尝试删除“group by”,但它改变了结果(我可能是错的,没有广泛测试它)。

I want to select from the following table all the rows which have similar values in the fname column as the first in their order. IOW from this table I want to retrieve rows with ids 2,5 and 7 (because " anna" comes after "anna", and "michaela" and "michaal" come after "michael").

+----+------------+----------+
| id | fname      | lname    |
+----+------------+----------+
|  1 | anna       | milski   |
|  2 |  anna      | nguyen   |
|  3 | michael    | michaels |
|  4 | james      | bond     |
|  5 | michaela   | king     |
|  6 | bruce      | smart    |
|  7 | michaal    | hardy    |
+----+------------+----------+

What I have so far is this:

select *, count(fname) cnt 
from users group by soundex(fname) 
having count(soundex(fname)) > 1;

but since I'm grouping it the result is

+----+----------+----------+-----+
| id | fname    | lname    | cnt |
+----+----------+----------+-----+
|  1 | anna     | milski   |   2 |
|  3 | michael  | michaels |   3 |
+----+----------+----------+-----+

What I want retrieved is this:

+----+----------+----------+-----+
| id | fname    | lname    | cnt |
+----+----------+----------+-----+
|  2 |  anna    | nyugen   |   2 |
|  5 | michaela | king     |   3 |
|  7 | michaal  | hardy    |   3 |
+----+----------+----------+-----+

What should I change about the query? I tried removing "group by" but it changes the results (I could be wrong, haven't tested it extensively).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

定格我的天空 2024-10-23 14:33:07

我重新阅读了您最初的问题,并提出了以下解决方案:

SELECT *
FROM   users
WHERE  id IN
       (SELECT id
       FROM    users t4
               INNER JOIN
                       (SELECT  soundex(fname) AS snd,
                                COUNT(*)       AS cnt
                       FROM     users          AS t5
                       GROUP BY snd
                       HAVING   cnt > 1
                       )
                       AS t6
               ON      soundex(t4.fname)=snd
       )
AND    id NOT IN
       (SELECT  MIN(t2.id) AS wanted
       FROM     users t2
                INNER JOIN
                         (SELECT  soundex(fname) AS snd,
                                  COUNT(*)       AS cnt
                         FROM     users          AS t1
                         GROUP BY snd
                         HAVING   cnt > 1
                         )
                         AS t3
                ON       soundex(t2.fname)=snd
       GROUP BY snd
       );

它有点过于复杂,但它可以正常工作并完全满足您的要求:)

I've re-read your initial question and I've came up with the following solution:

SELECT *
FROM   users
WHERE  id IN
       (SELECT id
       FROM    users t4
               INNER JOIN
                       (SELECT  soundex(fname) AS snd,
                                COUNT(*)       AS cnt
                       FROM     users          AS t5
                       GROUP BY snd
                       HAVING   cnt > 1
                       )
                       AS t6
               ON      soundex(t4.fname)=snd
       )
AND    id NOT IN
       (SELECT  MIN(t2.id) AS wanted
       FROM     users t2
                INNER JOIN
                         (SELECT  soundex(fname) AS snd,
                                  COUNT(*)       AS cnt
                         FROM     users          AS t1
                         GROUP BY snd
                         HAVING   cnt > 1
                         )
                         AS t3
                ON       soundex(t2.fname)=snd
       GROUP BY snd
       );

It's a bit over-complicated, but it works and delivers exactly what you asked for :)

吝吻 2024-10-23 14:33:07

您似乎得到了您所要求的 - SOUNDEX(fname) 将使 Soundex 哈希值仅来自名字,而不是整个字符串。您可以研究一些选项:

SELECT *, COUNT(SOUNDEX(CONCAT(fname, lname))) AS cnt
GROUP BY SOUNDEX(CONCAT(fname, lname))
HAVING cnt > 1;

或者

SELECT *, COUNT(SOUNDEX(fname)) AS cnt1, COUNT(SOUNDEX(lname)) AS cnt2
GROUP BY SOUNDEX(fname), SOUNDEX(lname)
HAVING cnt1 > 1 OR cnt2 > 1

这取决于您想要实现的目标:相似的名字、姓氏或两者的一些合成哈希的计数。

You seem to get what you're asking for - SOUNDEX(fname) would make Soundex hashes only from first name, not whole string. A few of options you can investigate:

SELECT *, COUNT(SOUNDEX(CONCAT(fname, lname))) AS cnt
GROUP BY SOUNDEX(CONCAT(fname, lname))
HAVING cnt > 1;

or

SELECT *, COUNT(SOUNDEX(fname)) AS cnt1, COUNT(SOUNDEX(lname)) AS cnt2
GROUP BY SOUNDEX(fname), SOUNDEX(lname)
HAVING cnt1 > 1 OR cnt2 > 1

It depends on what do you want to achieve: count of similar first name, last names or some synth hash of both.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文