如何让 T-SQL 代码查找重复项?
MS Access 有一个按钮可以生成用于查找重复行的 SQL 代码。 我不知道SQL Server 2005/2008 Managment Studio是否有这个。
如果有,请指出在哪里
如果没有,请告诉我怎样才能有一个 T-SQL 助手来创建这样的代码。
>
MS Access has a button to generate sql code for finding duplicated rows. I don't know if SQL Server 2005/2008 Managment Studio has this.
If it has, please point where
If it has not, please tell me how can I have a T-SQL helper for creating code like this.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
好吧,如果表中的整行都是重复项,则至少没有为该表设置主键,否则至少主键值会有所不同。
但是,以下是如何构建 SQL 来获取一组列上的重复项:
这将查找对于列 col1-col4 多次具有相同值组合的行。
例如,在下表中,行 2+3 将是重复的:
这两行在列 col1-col4 中共享公共值,因此,通过该 SQL,被视为重复。 展开列列表以包含您想要分析的所有列。
Well, if you have entire rows as duplicates in your table, you've at least not got a primary key set up for that table, otherwise at least the primary key value would be different.
However, here's how to build a SQL to get duplicates over a set of columns:
This will find rows which, for columns col1-col4, has the same combination of values, more than once.
For instance, in the following table, rows 2+3 would be duplicates:
The two rows share common values in columns col1-col4, and thus, by that SQL, is considered duplicates. Expand the list of columns to contain all the columns you wish to analyze this for.
如果您使用的是 SQL Server 2005+,您可以使用以下代码来查看所有行以及其他列:
您还可以使用此技术删除(或以其他方式处理)重复项:
ROW_NUMBER 非常强大 - 您可以做很多事情可以用它来做 - 请参阅关于它的 BOL 文章 http://msdn.microsoft .com/en-us/library/ms186734.aspx
If you're using SQL Server 2005+, you can use the following code to see all the rows along with other columns:
Youd can also delete (or otherwise work with) duplicates using this technique:
ROW_NUMBER is extremely powerful - there is much you can do with it - see the BOL article on it at http://msdn.microsoft.com/en-us/library/ms186734.aspx
当我需要转储具有一个或多个重复字段的整行但我不想在表中键入每个字段名称时,我找到了此解决方案:
I found this solution when I need to dump entire rows with one or more duplicate fields but I don't want to type every field name in the table:
AFAIK,事实并非如此。 只需创建一个按表的所有字段进行分组的 select 语句,并使用计数大于 1 的having 子句进行过滤。
如果您的行除键之外都重复,则不要在选择字段中包含该键。
AFAIK, it doesn't. Just make a select statement grouping by all the fields of a table, and filtering using a having clause where the count is greater than 1.
If your rows are duplicated except by the key, then don't include the key in the select fields.
另一种方法是连接一个表本身。
注意: aBase.Pkey < aDupes.Pkey 之所以存在,是因为将表与自身连接起来将为每个匹配创建两行,因为条件始终为真两次。
换句话说:
如果表 aBase 的行等于 aDupes 中的行(基于 ColA 和 ColB),则该匹配的反映也将为 true - aDupes 的行等于基于 ColA 和 ColB 的 aBase 行。 因此,这两个匹配项都将在结果集中返回。
通过任意挑选其中一个表具有较低键的所有结果来缩小范围/消除这种反射。
< 或> 没关系,只要键不同即可。
这也可以过滤掉与其自身相同的行的匹配项,因为aBase.Pkey < aDupes.Pkey 强制主键不同。
Another way one can do this is by joining a table on itself.
Note: The aBase.Pkey < aDupes.Pkey is there because joining a table against itself will create two rows per match since the condition will always be true twice.
In other words:
If table aBase has a row equal to a row from aDupes (based on ColA and ColB), the reflection of that match will also be true - that aDupes has a row equal to a row aBase based on ColA and ColB. Therefore both of those matches will be returned in the result set.
Narrow this down/eliminate this reflection by arbitrarily picking all results where one of the tables has a lower key.
< or > doesn't matter, as long as the keys are different.
This also takes care of filtering out matches with a row upon itself because aBase.Pkey < aDupes.Pkey forces the primary keys to be different.