SQL Server 2008 重复数据删除
长话短说,我接手了一个项目,数据库中的一个表非常需要重复数据删除。该表如下所示:
supply_req_id | int | [primary key]
supply_req_dt | datetime |
request_id | int | [foreign key]
supply_id | int | [foreign key]
is_disabled | bit |
存在具有相同 request_id 和 Supply_id 的记录的重复项。我想找到一种最佳实践方法来消除该表的重复数据。
[编辑]
@Kirk_Broadhurst,谢谢你的提问。由于 Supply_req_id 没有在其他地方引用,我会回答说保留第一个,删除任何后续出现的情况。
节日快乐
Long story short, I took over a project and a table in the database is in serious need of de-duping. The table looks like this:
supply_req_id | int | [primary key]
supply_req_dt | datetime |
request_id | int | [foreign key]
supply_id | int | [foreign key]
is_disabled | bit |
The duplication is exists with records having the same request_id and supply_id. I'd like to find a best practice way to de-dupe this table.
[EDIT]
@Kirk_Broadhurst, thanks for the question. Since supply_req_id is not referenced anywhere else, I would answer by saying keep the first, delete any subsequent occurances.
Happy Holidays
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这会为 (supply_req_dt, request_id) 分组中的每一行创建一个排名,从 1 = 最低的 Supply_req_id 开始。任何骗局都有一个价值> 1
然后添加唯一约束或INDEX
This creates a rank for each row in the (supply_req_dt, request_id) grouping, starting with 1 = lowest supply_req_id. Any dupe has a value > 1
Then add a unique constraint or INDEX
似乎应该有一个命令,但也许那是因为我习惯了不同的数据库服务器。这是相关的支持文档:
如何从 SQL Server 中的表中删除重复的行
http://support.microsoft.com/kb/139444
Seems like there should be a command for this, but maybe that's because I'm used to a different database server. Here's the relevant support doc:
How to remove duplicate rows from a table in SQL Server
http://support.microsoft.com/kb/139444
您需要阐明确定在“匹配”情况下保留哪条记录的规则 - 最新的、最早的、
is_disabled
true 或 false 的记录?一旦确定了该规则,剩下的事情就相当简单了:
不同
记录假设您想要保留任何“重复”对的最新记录。您的查询将如下所示:
问题是,如果
supply_req_dt
也重复,那么您将保留两个重复项。修复方法是执行另一个group by
并选择顶部的id
作为临时步骤。但如果您不需要这样做,请不要为此烦恼。
You need to clarify your rule for determining which record to keep in the case of a 'match' - the most recent, the earliest, the one that has
is_disabled
true, or false?Once you've identified that rule, the rest is fairly simple:
distinct
recordsSo let's say you want to keep the most recent records of any 'duplicate' pair. Your query would look like this:
The catch is that if the
supply_req_dt
is also duplicated, then you'll be keeping both of the duplicates. The fix is to do anothergroup by
and select the topid
as an interim step. But if you don't need to do that, don't bother with it.