查找具有重复/相似列值的行 MySQL
我想从下表中选择 fname 列中具有相似值的所有行作为其顺序中的第一行。 IOW 从此表中我想检索 id 为 2,5 和 7 的行(因为“ anna”位于“anna”和“michaela”之后>”和“michaal”位于“michael”之后)。
+----+------------+----------+
| id | fname | lname |
+----+------------+----------+
| 1 | anna | milski |
| 2 | anna | nguyen |
| 3 | michael | michaels |
| 4 | james | bond |
| 5 | michaela | king |
| 6 | bruce | smart |
| 7 | michaal | hardy |
+----+------------+----------+
到目前为止我所拥有的是这样的:
select *, count(fname) cnt
from users group by soundex(fname)
having count(soundex(fname)) > 1;
但由于我将其分组,所以结果是
+----+----------+----------+-----+
| id | fname | lname | cnt |
+----+----------+----------+-----+
| 1 | anna | milski | 2 |
| 3 | michael | michaels | 3 |
+----+----------+----------+-----+
我想要检索的是这样的:
+----+----------+----------+-----+
| id | fname | lname | cnt |
+----+----------+----------+-----+
| 2 | anna | nyugen | 2 |
| 5 | michaela | king | 3 |
| 7 | michaal | hardy | 3 |
+----+----------+----------+-----+
我应该对查询进行哪些更改?我尝试删除“group by”,但它改变了结果(我可能是错的,没有广泛测试它)。
I want to select from the following table all the rows which have similar values in the fname column as the first in their order. IOW from this table I want to retrieve rows with ids 2,5 and 7 (because " anna" comes after "anna", and "michaela" and "michaal" come after "michael").
+----+------------+----------+
| id | fname | lname |
+----+------------+----------+
| 1 | anna | milski |
| 2 | anna | nguyen |
| 3 | michael | michaels |
| 4 | james | bond |
| 5 | michaela | king |
| 6 | bruce | smart |
| 7 | michaal | hardy |
+----+------------+----------+
What I have so far is this:
select *, count(fname) cnt
from users group by soundex(fname)
having count(soundex(fname)) > 1;
but since I'm grouping it the result is
+----+----------+----------+-----+
| id | fname | lname | cnt |
+----+----------+----------+-----+
| 1 | anna | milski | 2 |
| 3 | michael | michaels | 3 |
+----+----------+----------+-----+
What I want retrieved is this:
+----+----------+----------+-----+
| id | fname | lname | cnt |
+----+----------+----------+-----+
| 2 | anna | nyugen | 2 |
| 5 | michaela | king | 3 |
| 7 | michaal | hardy | 3 |
+----+----------+----------+-----+
What should I change about the query? I tried removing "group by" but it changes the results (I could be wrong, haven't tested it extensively).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我重新阅读了您最初的问题,并提出了以下解决方案:
它有点过于复杂,但它可以正常工作并完全满足您的要求:)
I've re-read your initial question and I've came up with the following solution:
It's a bit over-complicated, but it works and delivers exactly what you asked for :)
您似乎得到了您所要求的 -
SOUNDEX(fname)
将使 Soundex 哈希值仅来自名字,而不是整个字符串。您可以研究一些选项:或者
这取决于您想要实现的目标:相似的名字、姓氏或两者的一些合成哈希的计数。
You seem to get what you're asking for -
SOUNDEX(fname)
would make Soundex hashes only from first name, not whole string. A few of options you can investigate:or
It depends on what do you want to achieve: count of similar first name, last names or some synth hash of both.