使用串联键在表格中插入不在tablee中的行
我有2张桌子。 TABLEA由CSV导入填充,通常包含10K和15K行之间。 TableB具有相同的结构,现在已经长到95k行。为了确定TableB中不在tableB的排行行,我需要比较tablea中的4个字段的串联与表格中的相同串联。
下面的代码一直在运行,随着TableB的增长,但是花了很长时间,以至于需要取消并且无法完成。
我坚信使用串联字段作为比较,导致执行时间超出可用性。
有更好的方法吗?
DELETE FROM billing..whse_Temp
BULK INSERT billing..whse_Temp
FROM '/mnt/ABC/ABC.csv'
WITH
(
FORMAT='csv',
FIRSTROW=2,
FIELDTERMINATOR=',',
ROWTERMINATOR='\r\n'
)
INSERT INTO billing..whse
SELECT * FROM billing..whse_Temp S
WHERE CONCAT(S.RunTimeStamp, S.CS_Datacenter,S.Customer, S.ServerName) NOT IN
(
SELECT CONCAT(RunTimeStamp, CS_Datacenter, Customer, ServerName)
FROM billing..whse
)
I have 2 tables. TableA gets populated by a csv import and typically contains between 10k and 15k rows. TableB has the same structure, and has now grown to about 95k rows. In order to determine rows in TableA that are not in TableB, I need to compare a concatenation of 4 fields in TableA with the same concatenation in TableB.
The code below has been working as TableB has been growing, but is just taking so long that it needs to be cancelled and does not finish.
I strongly believe that the use of concatenated fields as a comparison is causing execution times to grow beyond usability.
Is there a better approach to the problem?
DELETE FROM billing..whse_Temp
BULK INSERT billing..whse_Temp
FROM '/mnt/ABC/ABC.csv'
WITH
(
FORMAT='csv',
FIRSTROW=2,
FIELDTERMINATOR=',',
ROWTERMINATOR='\r\n'
)
INSERT INTO billing..whse
SELECT * FROM billing..whse_Temp S
WHERE CONCAT(S.RunTimeStamp, S.CS_Datacenter,S.Customer, S.ServerName) NOT IN
(
SELECT CONCAT(RunTimeStamp, CS_Datacenter, Customer, ServerName)
FROM billing..whse
)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
只需使用
不存在
:适当的索引:
Simply use
NOT EXISTS
:The appropriate index for this:
我敢肯定,有一个方法可以使用Merge命令来完成操作,但是我从来没有真正使用过这些命令。我敢肯定,在多个列之间存在一些方法,但是我个人发现拥有完整的联接条件更清楚,然后仅测试联接失败的位置。 (即:右侧没有行):
I'm sure there are ways to do it with a MERGE command, but I've never really used those. I'm sure there's ways with EXISTS across multiple columns, but I personally find it clearer to have the full join condition and then just test for where the join failed. (ie: no row on the right side):