经典的SQL Server操作

发布于 2024-10-28 06:40:54 字数 1047 浏览 2 评论 0原文

我假设这是每个使用 SQL Server 的人最常见的情况。

场景：

我有这些表 tabSRC_A(id,date,data1)、tabSRC_B(id,Date,data2) 和 tabDEST

现在我的任务是为了从 tableSRC_A 获取数据，tableSRC_B 对它们应用一些过滤和清理，并将它们插入到 tabDEST 中。

我正在使用以下代码执行此操作

insert into tabDest(id, Date, Data1, Data2)
   Select id, date, Data1, Data2 
   from tabSRC_A A 
   inner join tabSRC_B B on A.id = B.id and A.date = B.date
   where not exists 
             (select * from tabDest Dest 
              where Dest.id = B.id and Dest.date = B.date)

，如果已经存在，我将进行更新

这是此操作的最佳解决方案吗？

每个表的大小为 1000 万行

我还在考虑使用代理键创建一个视图并根据 id 执行检查而不是使用上述方法检查每一行

像这样的东西

 insert into tabDest(id, Date, Data1, Data2)
    Select id, date, Data1, Data2 
    from view_Created_From_TabA_TabB_adding_a_SurrogateKey_Kid SV
    where SV.Kid > select (max(id) from tabDest)

我假设这会快得多。

如果您有任何建议，请指导我。

（我使用的是 SQL Server 2000，我知道它很旧）

原文

I am assuming this is the most common scenario with everyone who is working with SQL Server.

Scenario:

I have these tables tabSRC_A(id,date,data1), tabSRC_B(id,Date,data2) and tabDEST

Now my task is to get the data from tableSRC_A, tableSRC_B apply some filtering and cleanup on them and insert them into tabDEST.

I am doing this using the following code

insert into tabDest(id, Date, Data1, Data2)
   Select id, date, Data1, Data2 
   from tabSRC_A A 
   inner join tabSRC_B B on A.id = B.id and A.date = B.date
   where not exists 
             (select * from tabDest Dest 
              where Dest.id = B.id and Dest.date = B.date)

and I am updating if already exists

Is this the best solution for this operation?

The size of the tables are 10 million rows each

I was also thinking about creating a view with surrogate key and perform a check based on the id instead of checking every row using the above method

Something like this

 insert into tabDest(id, Date, Data1, Data2)
    Select id, date, Data1, Data2 
    from view_Created_From_TabA_TabB_adding_a_SurrogateKey_Kid SV
    where SV.Kid > select (max(id) from tabDest)

I am assuming this would be much faster.

Please guide me with any suggestions you have.

(I'm using SQL Server 2000, I know its very old)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

李白 2024-11-04 06:40:54

您是否尝试过 LEFT join 来检测不存在？

insert into tabDest(id, Date, Data1, Data2)
   Select id, date, Data1, Data2 
   from tabSRC_A A 
   inner join tabSRC_B B on A.id = B.id and A.date = B.date
   LEFT JOIN tabDest Dest 
   ON Dest.id = B.id and Dest.date = B.date
WHERE
   Dest.id is null

Have you tried a LEFT join to detect not exists?

insert into tabDest(id, Date, Data1, Data2)
   Select id, date, Data1, Data2 
   from tabSRC_A A 
   inner join tabSRC_B B on A.id = B.id and A.date = B.date
   LEFT JOIN tabDest Dest 
   ON Dest.id = B.id and Dest.date = B.date
WHERE
   Dest.id is null

回复收藏 0 原文

海螺姑娘 2024-11-04 06:40:54

您提供的解决方案不错，您可以尝试以下操作：

insert into tabDest(id, Date, Data1, Data2)
   Select A.id, A.date, Data1, Data2 
   from tabSRC_A A 
   inner join tabSRC_B B on A.id = B.id and A.date = B.date
   left join tabDest D.id = B.id and d.date = b.date
   where d is null;

不同之处在于，子查询将为每一行执行一次，在这种情况下，左连接将执行一次，但查看表的大小，这将也很慢并且差异可能很小。

视图接缝的创建是多余的，因为通常它们并不比查询快，使用 id 的想法很好，但在您的情况下，日期在识别中也有一些作用。这使我得出结论，您不能使用该比较，并且您可以拥有两个具有不同时间的相同 ID。

如果 id 是唯一的，您可以使用此语句

insert into tabDest(id, Date, Data1, Data2)
   Select A.id, A.date, Data1, Data2 
   from tabSRC_A A 
   inner join tabSRC_B B on A.id = B.id and A.date = B.date
   where A.id > (SELECT max(d.id) FROM Dest d);

您还能做什么？

如果您可以向表 A 添加一次默认值为 0 的列，那么您可以使用存储过程进行迁移，您选择仅插入那些值为 0 的列（不使用 null），然后将它们设置为 1。使用此解决方案，您将不会扫描整个表。

That solution you have provide not bad, you could try this:

insert into tabDest(id, Date, Data1, Data2)
   Select A.id, A.date, Data1, Data2 
   from tabSRC_A A 
   inner join tabSRC_B B on A.id = B.id and A.date = B.date
   left join tabDest D.id = B.id and d.date = b.date
   where d is null;

The difference is that the sub-query will be executer once for each row, and in this case the left join will be done once but looking on the size of the table this will also be slow and the difference might be slight.

The creation of the view seams to be redundant, because generally they are not faster than query, the idea with id is good but in your case also date have some role in the identification. This leads me to conclude that you can not use that comparison and you can have two same ids with different time.

In case when id is unique you can use this statement

insert into tabDest(id, Date, Data1, Data2)
   Select A.id, A.date, Data1, Data2 
   from tabSRC_A A 
   inner join tabSRC_B B on A.id = B.id and A.date = B.date
   where A.id > (SELECT max(d.id) FROM Dest d);

What else you can do?

If You have possibility to add once column to table A with default value 0, then you could use a stored procedure for migration, you selection to insert only those columns that have value 0 (not use null) and then you set to them 1. Using this solution you will no scan full table.

回复收藏 0 原文

~没有更多了~