T-SQL:删除所有重复行但保留一个

发布于 2024-11-08 11:10:40 字数 194 浏览 0 评论 0原文

我有一个包含大量行的表。不允许重复,但由于行的创建方式存在问题,我知道该表中有一些重复。 我需要从关键列的角度消除多余的行。其他一些列的数据可能略有不同,但我不关心这一点。不过,我仍然需要保留其中一行。 SELECT DISTINCT 不起作用,因为它对所有列进行操作,并且我需要根据键列抑制重复项。

如何删除多余的行但仍然有效地保留一行?

I have a table with a very large amount of rows. Duplicates are not allowed but due to a problem with how the rows were created I know there are some duplicates in this table.
I need to eliminate the extra rows from the perspective of the key columns. Some other columns may have slightly different data but I do not care about that. I still need to keep one of these rows however. SELECT DISTINCT won't work because it operates on all columns and I need to suppress duplicates based on the key columns.

How can I delete the extra rows but still keep one efficiently?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

醉南桥 2024-11-15 11:10:40

您没有说明您使用的是什么版本,但在 SQL 2005 及更高版本中,您可以使用带有 OVER 子句。它有点像这样:

WITH cte AS (
  SELECT[foo], [bar], 
     row_number() OVER(PARTITION BY foo, bar ORDER BY baz) AS [rn]
  FROM TABLE
)
DELETE cte WHERE [rn] > 1

尝试一下,看看你会得到什么。

(编辑:为了提供帮助,有人编辑了 CTE 中的 ORDER BY 子句。需要明确的是,您可以在此处按您想要的任何内容进行排序,它不必是返回的列之一事实上,这里的一个常见用例是“foo,bar”是组标识符,“baz”是某种时间戳,为了保持最新,您需要ORDER。 BY baz 描述)

You didn't say what version you were using, but in SQL 2005 and above, you can use a common table expression with the OVER Clause. It goes a little something like this:

WITH cte AS (
  SELECT[foo], [bar], 
     row_number() OVER(PARTITION BY foo, bar ORDER BY baz) AS [rn]
  FROM TABLE
)
DELETE cte WHERE [rn] > 1

Play around with it and see what you get.

(Edit: In an attempt to be helpful, someone edited the ORDER BY clause within the CTE. To be clear, you can order by anything you want here, it needn't be one of the columns returned by the cte. In fact, a common use-case here is that "foo, bar" are the group identifier and "baz" is some sort of time stamp. In order to keep the latest, you'd do ORDER BY baz desc)

陌若浮生 2024-11-15 11:10:40

查询示例:

DELETE FROM Table
WHERE ID NOT IN
(
SELECT MIN(ID)
FROM Table
GROUP BY Field1, Field2, Field3, ...
)

这里的fields 是您想要对重复行进行分组的列。

Example query:

DELETE FROM Table
WHERE ID NOT IN
(
SELECT MIN(ID)
FROM Table
GROUP BY Field1, Field2, Field3, ...
)

Here fields are column on which you want to group the duplicate rows.

慈悲佛祖 2024-11-15 11:10:40

这是我对此的看法,并提供了一个可运行的示例。 注意这仅适用于Id唯一且其他列中有重复值的情况。

DECLARE @SampleData AS TABLE (Id int, Duplicate varchar(20))

INSERT INTO @SampleData
SELECT 1, 'ABC' UNION ALL
SELECT 2, 'ABC' UNION ALL
SELECT 3, 'LMN' UNION ALL
SELECT 4, 'XYZ' UNION ALL
SELECT 5, 'XYZ'

DELETE FROM @SampleData WHERE Id IN (
    SELECT Id FROM (
        SELECT 
            Id
            ,ROW_NUMBER() OVER (PARTITION BY [Duplicate] ORDER BY Id) AS [ItemNumber]
            -- Change the partition columns to include the ones that make the row distinct
        FROM 
            @SampleData
    ) a WHERE ItemNumber > 1 -- Keep only the first unique item
)

SELECT * FROM @SampleData

结果:

Id          Duplicate
----------- ---------
1           ABC
3           LMN
4           XYZ

不知道为什么这就是我首先想到的......绝对不是最简单的方法,但它有效。

Here's my twist on it, with a runnable example. Note this will only work in the situation where Id is unique, and you have duplicate values in other columns.

DECLARE @SampleData AS TABLE (Id int, Duplicate varchar(20))

INSERT INTO @SampleData
SELECT 1, 'ABC' UNION ALL
SELECT 2, 'ABC' UNION ALL
SELECT 3, 'LMN' UNION ALL
SELECT 4, 'XYZ' UNION ALL
SELECT 5, 'XYZ'

DELETE FROM @SampleData WHERE Id IN (
    SELECT Id FROM (
        SELECT 
            Id
            ,ROW_NUMBER() OVER (PARTITION BY [Duplicate] ORDER BY Id) AS [ItemNumber]
            -- Change the partition columns to include the ones that make the row distinct
        FROM 
            @SampleData
    ) a WHERE ItemNumber > 1 -- Keep only the first unique item
)

SELECT * FROM @SampleData

And the results:

Id          Duplicate
----------- ---------
1           ABC
3           LMN
4           XYZ

Not sure why that's what I thought of first... definitely not the simplest way to go but it works.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文