MySQL多重复记录合并
以前的 DBA 管理着一个包含 240 万个条目的非关系表,所有条目都具有唯一的 ID。但是,每个记录中存在具有不同数据的重复记录,例如:
+---------+---------+--------------+----------------------+-------------+
| id | Name | Address | Phone | Email | LastVisited |
+---------+---------+--------------+---------+------------+-------------+
| 1 | bob | 12 Some Road | 02456 | | |
| 2 | bobby | | 02456 | bob@domain | |
| 3 | bob | 12 Some Rd | 02456 | | 2010-07-13 |
| 4 | sir bob | | 02456 | | |
| 5 | bob | 12SomeRoad | 02456 | | |
| 6 | mr bob | | 02456 | | |
| 7 | robert | | 02456 | | |
+---------+---------+--------------+---------+------------+-------------+
这不是确切的表 - 真实的表有 32 列 - 这只是为了说明
我知道如何识别重复项,在本例中我使用的是电话数字。我已将重复项提取到一个单独的表中 - 总共有 730k 个整体。
合并这些记录(并标记不需要的记录以进行删除)的最有效方法是什么?
我已经考虑过将 UPDATE 与 INNER JOIN 一起使用,但需要几个 WHERE 子句,因为我想使用后续记录中的数据更新第一条记录,其中后续记录具有前一条记录没有的附加数据。
我看过第三方软件,例如 Fuzzy Dups,但我如果可能的话,就像一个纯MySQL选项
最终的目标是我会留下这样的东西:
+---------+---------+--------------+----------------------+-------------+
| id | Name | Address | Phone | Email | LastVisited |
+---------+---------+--------------+---------+------------+-------------+
| 1 | bob | 12 Some Road | 02456 | bob@domain | 2010-07-13 |
+---------+---------+--------------+---------+------------+-------------+
我应该考虑在存储过程/函数中循环还是有一些我错过的真正简单的事情?
A previous DBA managed a non relational table with 2.4M entries, all with unique ID's. However, there are duplicate records with different data in each record for example:
+---------+---------+--------------+----------------------+-------------+
| id | Name | Address | Phone | Email | LastVisited |
+---------+---------+--------------+---------+------------+-------------+
| 1 | bob | 12 Some Road | 02456 | | |
| 2 | bobby | | 02456 | bob@domain | |
| 3 | bob | 12 Some Rd | 02456 | | 2010-07-13 |
| 4 | sir bob | | 02456 | | |
| 5 | bob | 12SomeRoad | 02456 | | |
| 6 | mr bob | | 02456 | | |
| 7 | robert | | 02456 | | |
+---------+---------+--------------+---------+------------+-------------+
This isnt the exact table - the real table has 32 columns - this is just to illustrate
I know how to identify the duplicates, in this case i'm using the phone number. I've extracted the duplicates into a seperate table - there's 730k entires in total.
What would be the most efficient way of merging these records (and flagging the un-needed records for deletion)?
I've looked at using UPDATE with INNER JOIN's, but there are several WHERE clauses needed, because i want to update the first record with data from subsequent records, where that subsequent record has additional data the former record does not.
I've looked at third party software such as Fuzzy Dups, but i'd like a pure MySQL option if possible
The end goal then is that i'd be left with something like:
+---------+---------+--------------+----------------------+-------------+
| id | Name | Address | Phone | Email | LastVisited |
+---------+---------+--------------+---------+------------+-------------+
| 1 | bob | 12 Some Road | 02456 | bob@domain | 2010-07-13 |
+---------+---------+--------------+---------+------------+-------------+
Should i be looking at looping in a stored procedure / function or is there some real easy thing i've missed?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
你必须创建一个程序,但在此之前
创建您自己的 temp_table,例如:
您必须创建上述物理表,以便您可以在其上运行游标。
创建过程 myPROC
{
在这里,您也需要创建一个新的similar_tempTable,其中包含如下值
下一步是从similar_tempTable中提取您想要的每一列的值,并更新到myTable的行,其中id=L_id并删除其余的重复行来自我的表。
还有一件事,在每次光标迭代后截断similar_tempTable...
希望这会帮助你...
U have to create a PROCEDURE, but before that
create ur own temp_table like :
U have to create the above mentioned physical table so that u can run a cursor on it.
create PROCEDURE myPROC
{
And here too u need to create a new similar_tempTable which will contain the values as
The next step is to extract the values of each column u want from similar_tempTable and update into the the row of myTable where id=L_id and delete the rest duplicate rows from myTable.
And one more thing, truncate the similar_tempTable after every iteration of the cursor...
Hope this will help u...