如何让 T-SQL 代码查找重复项?

发布于 2024-07-29 08:53:37 字数 189 浏览 2 评论 0原文

MS Access 有一个按钮可以生成用于查找重复行的 SQL 代码。 我不知道SQL Server 2005/2008 Managment Studio是否有这个。

  1. 如果有,请指出在哪里

  2. 如果没有,请告诉我怎样才能有一个 T-SQL 助手来创建这样的代码。

    >

MS Access has a button to generate sql code for finding duplicated rows. I don't know if SQL Server 2005/2008 Managment Studio has this.

  1. If it has, please point where

  2. If it has not, please tell me how can I have a T-SQL helper for creating code like this.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

南七夏 2024-08-05 08:53:37

好吧,如果表中的整行都是重复项,则至少没有为该表设置主键,否则至少主键值会有所不同。

但是,以下是如何构建 SQL 来获取一组列上的重复项:

SELECT col1, col2, col3, col4
FROM table
GROUP BY col1, col2, col3, col4
HAVING COUNT(*) > 1

这将查找对于列 col1-col4 多次具有相同值组合的行。

例如,在下表中,行 2+3 将是重复的:

PK    col1    col2    col3    col4    col5
1       1       2       3       4      6
2       1       3       4       7      7
3       1       3       4       7      10
4       2       3       1       4      5

这两行在列 col1-col4 中共享公共值,因此,通过该 SQL,被视为重复。 展开列列表以包含您想要分析的所有列。

Well, if you have entire rows as duplicates in your table, you've at least not got a primary key set up for that table, otherwise at least the primary key value would be different.

However, here's how to build a SQL to get duplicates over a set of columns:

SELECT col1, col2, col3, col4
FROM table
GROUP BY col1, col2, col3, col4
HAVING COUNT(*) > 1

This will find rows which, for columns col1-col4, has the same combination of values, more than once.

For instance, in the following table, rows 2+3 would be duplicates:

PK    col1    col2    col3    col4    col5
1       1       2       3       4      6
2       1       3       4       7      7
3       1       3       4       7      10
4       2       3       1       4      5

The two rows share common values in columns col1-col4, and thus, by that SQL, is considered duplicates. Expand the list of columns to contain all the columns you wish to analyze this for.

鼻尖触碰 2024-08-05 08:53:37

如果您使用的是 SQL Server 2005+,您可以使用以下代码来查看所有行以及其他列:

SELECT *, ROW_NUMBER() OVER (PARTITION BY col1, col2, col3, col4 ORDER BY (SELECT 0)) AS DuplicateRowNumber
FROM table

您还可以使用此技术删除(或以其他方式处理)重复项:

WITH cte AS
(SELECT *, ROW_NUMBER() OVER (PARTITION BY col1, col2, col3, col4 ORDER BY (SELECT 0)) AS DuplicateRowNumber
    FROM table
)
DELETE FROM cte WHERE DuplicateRowNumber > 1

ROW_NUMBER 非常强大 - 您可以做很多事情可以用它来做 - 请参阅关于它的 BOL 文章 http://msdn.microsoft .com/en-us/library/ms186734.aspx

If you're using SQL Server 2005+, you can use the following code to see all the rows along with other columns:

SELECT *, ROW_NUMBER() OVER (PARTITION BY col1, col2, col3, col4 ORDER BY (SELECT 0)) AS DuplicateRowNumber
FROM table

Youd can also delete (or otherwise work with) duplicates using this technique:

WITH cte AS
(SELECT *, ROW_NUMBER() OVER (PARTITION BY col1, col2, col3, col4 ORDER BY (SELECT 0)) AS DuplicateRowNumber
    FROM table
)
DELETE FROM cte WHERE DuplicateRowNumber > 1

ROW_NUMBER is extremely powerful - there is much you can do with it - see the BOL article on it at http://msdn.microsoft.com/en-us/library/ms186734.aspx

忆梦 2024-08-05 08:53:37

当我需要转储具有一个或多个重复字段的整行但我不想在表中键入每个字段名称时,我找到了此解决方案:

SELECT * FROM db WHERE col IN
    (SELECT col FROM db GROUP BY col HAVING COUNT(*) > 1)
    ORDER BY col

I found this solution when I need to dump entire rows with one or more duplicate fields but I don't want to type every field name in the table:

SELECT * FROM db WHERE col IN
    (SELECT col FROM db GROUP BY col HAVING COUNT(*) > 1)
    ORDER BY col
酒儿 2024-08-05 08:53:37

AFAIK,事实并非如此。 只需创建一个按表的所有字段进行分组的 select 语句,并使用计数大于 1 的having 子句进行过滤。

如果您的行除键之外都重复,则不要在选择字段中包含该键。

AFAIK, it doesn't. Just make a select statement grouping by all the fields of a table, and filtering using a having clause where the count is greater than 1.

If your rows are duplicated except by the key, then don't include the key in the select fields.

魄砕の薆 2024-08-05 08:53:37

另一种方法是连接一个表本身。

SELECT *
FROM dbo.TableA aBase
JOIN dbo.TableA aDupes ON aDupes.ColA = aBase.ColA AND
                          aDupes.ColB = aBase.ColB
WHERE aBase.Pkey < aDupes.Pkey

注意: aBase.Pkey < aDupes.Pkey 之所以存在,是因为将表与自身连接起来将为每个匹配创建两行,因为条件始终为真两次。

换句话说:
如果表 aBase 的行等于 aDupes 中的行(基于 ColA 和 ColB),则该匹配的反映也将为 true - aDupes 的行等于基于 ColA 和 ColB 的 aBase 行。 因此,这两个匹配项都将在结果集中返回。

通过任意挑选其中一个表具有较低键的所有结果来缩小范围/消除这种反射。

< 或> 没关系,只要键不同即可。

这也可以过滤掉与其自身相同的行的匹配项,因为aBase.Pkey < aDupes.Pkey 强制主键不同。

Another way one can do this is by joining a table on itself.

SELECT *
FROM dbo.TableA aBase
JOIN dbo.TableA aDupes ON aDupes.ColA = aBase.ColA AND
                          aDupes.ColB = aBase.ColB
WHERE aBase.Pkey < aDupes.Pkey

Note: The aBase.Pkey < aDupes.Pkey is there because joining a table against itself will create two rows per match since the condition will always be true twice.

In other words:
If table aBase has a row equal to a row from aDupes (based on ColA and ColB), the reflection of that match will also be true - that aDupes has a row equal to a row aBase based on ColA and ColB. Therefore both of those matches will be returned in the result set.

Narrow this down/eliminate this reflection by arbitrarily picking all results where one of the tables has a lower key.

< or > doesn't matter, as long as the keys are different.

This also takes care of filtering out matches with a row upon itself because aBase.Pkey < aDupes.Pkey forces the primary keys to be different.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文