经典的SQL Server操作
我假设这是每个使用 SQL Server 的人最常见的情况。
场景:
我有这些表 tabSRC_A(id,date,data1)
、tabSRC_B(id,Date,data2)
和 tabDEST
现在我的任务是为了从 tableSRC_A
获取数据,tableSRC_B
对它们应用一些过滤和清理,并将它们插入到 tabDEST
中。
我正在使用以下代码执行此操作
insert into tabDest(id, Date, Data1, Data2)
Select id, date, Data1, Data2
from tabSRC_A A
inner join tabSRC_B B on A.id = B.id and A.date = B.date
where not exists
(select * from tabDest Dest
where Dest.id = B.id and Dest.date = B.date)
,如果已经存在,我将进行更新
这是此操作的最佳解决方案吗?
每个表的大小为 1000 万行
我还在考虑使用代理键创建一个视图并根据 id 执行检查而不是使用上述方法检查每一行
像这样的东西
insert into tabDest(id, Date, Data1, Data2)
Select id, date, Data1, Data2
from view_Created_From_TabA_TabB_adding_a_SurrogateKey_Kid SV
where SV.Kid > select (max(id) from tabDest)
我假设这会快得多。
如果您有任何建议,请指导我。
(我使用的是 SQL Server 2000,我知道它很旧)
I am assuming this is the most common scenario with everyone who is working with SQL Server.
Scenario:
I have these tables tabSRC_A(id,date,data1)
, tabSRC_B(id,Date,data2)
and tabDEST
Now my task is to get the data from tableSRC_A
, tableSRC_B
apply some filtering and cleanup on them and insert them into tabDEST
.
I am doing this using the following code
insert into tabDest(id, Date, Data1, Data2)
Select id, date, Data1, Data2
from tabSRC_A A
inner join tabSRC_B B on A.id = B.id and A.date = B.date
where not exists
(select * from tabDest Dest
where Dest.id = B.id and Dest.date = B.date)
and I am updating if already exists
Is this the best solution for this operation?
The size of the tables are 10 million rows each
I was also thinking about creating a view with surrogate key and perform a check based on the id instead of checking every row using the above method
Something like this
insert into tabDest(id, Date, Data1, Data2)
Select id, date, Data1, Data2
from view_Created_From_TabA_TabB_adding_a_SurrogateKey_Kid SV
where SV.Kid > select (max(id) from tabDest)
I am assuming this would be much faster.
Please guide me with any suggestions you have.
(I'm using SQL Server 2000, I know its very old)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您是否尝试过 LEFT join 来检测不存在?
Have you tried a LEFT join to detect not exists?
您提供的解决方案不错,您可以尝试以下操作:
不同之处在于,子查询将为每一行执行一次,在这种情况下,左连接将执行一次,但查看表的大小,这将也很慢并且差异可能很小。
视图接缝的创建是多余的,因为通常它们并不比查询快,使用 id 的想法很好,但在您的情况下,日期在识别中也有一些作用。这使我得出结论,您不能使用该比较,并且您可以拥有两个具有不同时间的相同 ID。
如果 id 是唯一的,您可以使用此语句
您还能做什么?
如果您可以向表 A 添加一次默认值为 0 的列,那么您可以使用存储过程进行迁移,您选择仅插入那些值为 0 的列(不使用 null),然后将它们设置为 1。使用此解决方案,您将不会扫描整个表。
That solution you have provide not bad, you could try this:
The difference is that the sub-query will be executer once for each row, and in this case the left join will be done once but looking on the size of the table this will also be slow and the difference might be slight.
The creation of the view seams to be redundant, because generally they are not faster than query, the idea with id is good but in your case also date have some role in the identification. This leads me to conclude that you can not use that comparison and you can have two same ids with different time.
In case when id is unique you can use this statement
What else you can do?
If You have possibility to add once column to table A with default value 0, then you could use a stored procedure for migration, you selection to insert only those columns that have value 0 (not use null) and then you set to them 1. Using this solution you will no scan full table.