使用串联键在表格中插入不在tablee中的行

发布于 2025-01-25 15:12:44 字数 666 浏览 6 评论 0原文

我有2张桌子。 TABLEA由CSV导入填充，通常包含10K和15K行之间。 TableB具有相同的结构，现在已经长到95k行。为了确定TableB中不在tableB的排行行，我需要比较tablea中的4个字段的串联与表格中的相同串联。

下面的代码一直在运行，随着TableB的增长，但是花了很长时间，以至于需要取消并且无法完成。

我坚信使用串联字段作为比较，导致执行时间超出可用性。

有更好的方法吗？

DELETE FROM billing..whse_Temp
BULK INSERT billing..whse_Temp
FROM '/mnt/ABC/ABC.csv'
WITH
(
  FORMAT='csv',
  FIRSTROW=2,
  FIELDTERMINATOR=',',
  ROWTERMINATOR='\r\n'
)

INSERT INTO billing..whse
SELECT * FROM billing..whse_Temp S
WHERE CONCAT(S.RunTimeStamp, S.CS_Datacenter,S.Customer, S.ServerName) NOT IN
  (
    SELECT CONCAT(RunTimeStamp, CS_Datacenter, Customer, ServerName) 
    FROM billing..whse
  )

原文

I have 2 tables. TableA gets populated by a csv import and typically contains between 10k and 15k rows. TableB has the same structure, and has now grown to about 95k rows. In order to determine rows in TableA that are not in TableB, I need to compare a concatenation of 4 fields in TableA with the same concatenation in TableB.

The code below has been working as TableB has been growing, but is just taking so long that it needs to be cancelled and does not finish.

I strongly believe that the use of concatenated fields as a comparison is causing execution times to grow beyond usability.

Is there a better approach to the problem?

DELETE FROM billing..whse_Temp
BULK INSERT billing..whse_Temp
FROM '/mnt/ABC/ABC.csv'
WITH
(
  FORMAT='csv',
  FIRSTROW=2,
  FIELDTERMINATOR=',',
  ROWTERMINATOR='\r\n'
)

INSERT INTO billing..whse
SELECT * FROM billing..whse_Temp S
WHERE CONCAT(S.RunTimeStamp, S.CS_Datacenter,S.Customer, S.ServerName) NOT IN
  (
    SELECT CONCAT(RunTimeStamp, CS_Datacenter, Customer, ServerName) 
    FROM billing..whse
  )

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

空心↖ 2025-02-01 15:12:44

只需使用不存在：

INSERT INTO billing..whse
SELECT * FROM billing..whse_temp S
WHERE NOT EXISTS
(
  SELECT NULL
  FROM billing..whse w
  WHERE w.runtimestamp = s.runtimestamp
    AND w.cs_datacenter = s.cs_datacenter
    AND w.customer = s.customer
    AND w.servername = s.servername
);

适当的索引：

CREATE INDEX idx ON billing..whse (runtimestamp, cs_datacenter, customer, servername);

Simply use NOT EXISTS:

INSERT INTO billing..whse
SELECT * FROM billing..whse_temp S
WHERE NOT EXISTS
(
  SELECT NULL
  FROM billing..whse w
  WHERE w.runtimestamp = s.runtimestamp
    AND w.cs_datacenter = s.cs_datacenter
    AND w.customer = s.customer
    AND w.servername = s.servername
);

The appropriate index for this:

CREATE INDEX idx ON billing..whse (runtimestamp, cs_datacenter, customer, servername);

回复收藏 0 原文

二智少女猫性小仙女 2025-02-01 15:12:44

我敢肯定，有一个方法可以使用Merge命令来完成操作，但是我从来没有真正使用过这些命令。我敢肯定，在多个列之间存在一些方法，但是我个人发现拥有完整的联接条件更清楚，然后仅测试联接失败的位置。（即：右侧没有行）：

INSERT INTO billing..whse
SELECT S.* 
  FROM billing..whse_Temp S
       LEFT OUTER JOIN billing..whse W
         ON S.RunTimeStamp = W.RunTimeStamp
         AND S.CS_Datacenter = W.CS_Datacenter
         AND S.Customer = W.Customer
         AND S.ServerName = W.ServerName
WHERE W.RunTimeStamp IS NULL

I'm sure there are ways to do it with a MERGE command, but I've never really used those. I'm sure there's ways with EXISTS across multiple columns, but I personally find it clearer to have the full join condition and then just test for where the join failed. (ie: no row on the right side):

INSERT INTO billing..whse
SELECT S.* 
  FROM billing..whse_Temp S
       LEFT OUTER JOIN billing..whse W
         ON S.RunTimeStamp = W.RunTimeStamp
         AND S.CS_Datacenter = W.CS_Datacenter
         AND S.Customer = W.Customer
         AND S.ServerName = W.ServerName
WHERE W.RunTimeStamp IS NULL

回复收藏 0 原文

~没有更多了~