删除重复的行,只留下最旧的行?

发布于 2024-09-18 20:18:49 字数 184 浏览 7 评论 0原文

我有一个数据表,其中有许多来自用户提交的重复条目。

我想删除基于字段 subscriberEmail 的所有重复行,只留下原始提交。

换句话说,我想搜索所有重复的电子邮件,并删除这些行,只留下原始电子邮件。

如何在不交换表的情况下做到这一点?
我的表包含每行的唯一 ID。

I have a table of data and there are many duplicate entries from user submissions.

I want to delete all duplicates rows based on the field subscriberEmail, leaving only the original submission.

In other words, I want to search for all duplicate emails, and delete those rows, leaving only the original.

How can I do this without swapping tables?
My table contains unique IDs for each row.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

橘和柠 2024-09-25 20:18:49

由于您使用 id 列作为哪条记录是“原始”的指示符:

delete x 
from myTable x
 join myTable z on x.subscriberEmail = z.subscriberEmail
where x.id > z.id

这将为每个电子邮件地址留下一条记录。

编辑添加:

解释上面的查询...

这里的想法是将表与自身连接起来。假设您有该表的两个副本,每个副本的名称都不同。然后您可以将它们相互比较,并找到每个电子邮件地址的最低 ID。然后,您会看到稍后创建的重复记录,并且可以将其删除。 (在思考这个问题时,我正在想象 Excel。)

为了在表上执行该操作,将其与自身进行比较并能够识别每一侧,您可以使用表别名。 x 是表别名。它在 from 子句中分配,如下所示: from ; <别名>。 x 现在可以在同一查询中的其他地方使用,以作为快捷方式引用该表。

delete x 使用我们的操作和目标开始查询。我们将执行查询以从多个表中选择记录,并且希望删除 x 中出现的记录。

别名用于引用表的两个“实例”。 from myTable x join myTable z on x.subscriberEmail = z.subscriberEmail 在电子邮件匹配的地方将表与自身进行碰撞。如果没有后面的 where 子句,则每条记录都将被选择,因为它可以与自身结合起来。

where 子句限制所选的记录。 其中 x.id > z.id 允许别名为 x 的“实例”仅包含与电子邮件匹配但具有更高 id 值的记录。表中您真正想要的数据、唯一电子邮件地址(具有最低 ID)不会成为 x 的一部分,也不会被删除。 x 中的唯一记录将是重复记录(电子邮件地址),其 id 高于该电子邮件地址的原始记录。

在这种情况下,可以组合 join 和 where 子句:

delete x 
  from myTable x 
  join myTable z
    on x.subscriberEmail = z.subscriberEmail
      and x.id > z.id

为了防止重复,请考虑将subscriberEmail 列设置为UNIQUE 索引列。

Since you're using the id column as an indicator of which record is 'original':

delete x 
from myTable x
 join myTable z on x.subscriberEmail = z.subscriberEmail
where x.id > z.id

This will leave one record per email address.

edit to add:

To explain the query above...

The idea here is to join the table against itself. Pretend that you have two copies of the table, each named something different. Then you could compare them to each other, and find the lowest id or for each email address. You'd then see the duplicate records that were created later on and could delete them. (I was visualizing Excel when thinking about this.)

In order to do that operation on a table, compare it to itself and be able to identify each side, you use table aliases. x is a table alias. It is assigned in the from clause like so: from <table> <alias>. x can now be used elsewhere in the same query to refer to that table as a shortcut.

delete x starts the query off with our action and target. We're going to perform a query to select records from multiple tables, and we want to delete records that appear in x.

Aliases are used to refer to both 'instances' of the table. from myTable x join myTable z on x.subscriberEmail = z.subscriberEmail bumps the table up against itself where the emails match. Without the where clause that follows, every record would be selected as it could be joined up against itself.

The where clause limits the records that are selected. where x.id > z.id allows the 'instance' aliased x to contain only the records that match emails but have a higher id value. The data that you really want in the table, unique email addresses (with the lowest id) will not be part of x and will not be deleted. The only records in x will be duplicate records (email addresses) that have a higher id than the original record for that email address.

The join and where clauses could be combined in this case:

delete x 
  from myTable x 
  join myTable z
    on x.subscriberEmail = z.subscriberEmail
      and x.id > z.id

For preventing duplicates, consider making the subscriberEmail column a UNIQUE indexed column.

请恋爱 2024-09-25 20:18:49

如果每行都有一个唯一的 id,您可以尝试这样的操作。不要问我为什么需要第二个 select 语句,否则 mysql 不会让我执行。此外,按任何使您的结果独一无二的列进行分组。

delete from my_table where id in (
  select id from (
    select id from my_table a group by subscriberEmail having count(*) > 1
  ) b
);

If you have a unique id for each row, you can try something like this. Don't ask me why exactly you need the second select statement, mysql won't let me execute otherwise. Also, group by whatever columns make your results unique.

delete from my_table where id in (
  select id from (
    select id from my_table a group by subscriberEmail having count(*) > 1
  ) b
);
怪我闹别瞎闹 2024-09-25 20:18:49

怎么样,现在您不必使用自连接创建任何临时表

DELETE u1 FROM users u1, users u2 WHERE u1.id < u2.id AND u1.email = u2.email

来检查表中是否有重复记录

SELECT count(*) as Count, email FROM users u group by email having Count > 1

How about this, now you don't have to create any temporary tables using self joins

DELETE u1 FROM users u1, users u2 WHERE u1.id < u2.id AND u1.email = u2.email

To check if there are any duplicate records in table

SELECT count(*) as Count, email FROM users u group by email having Count > 1
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文