Sql Server 删除和合并性能

发布于 2024-12-07 12:31:43 字数 2542 浏览 1 评论 0原文

我有一个包含一些购买/销售数据的表，其中大约有 8M 条记录：

CREATE TABLE [dbo].[Transactions](
[id] [int] IDENTITY(1,1) NOT NULL,
[itemId] [bigint] NOT NULL,
[dt] [datetime] NOT NULL,
[count] [int] NOT NULL,
[price] [float] NOT NULL,
[platform] [char](1) NOT NULL
) ON [PRIMARY]

每 X 分钟我的程序都会获取每个 itemId 的新交易，我需要更新它。我的第一个解决方案是两步 DELETE+INSERT：

delete from Transactions where platform=@platform and itemid=@itemid
insert into Transactions (platform,itemid,dt,count,price) values (@platform,@itemid,@dt,@count,@price)
[...]
insert into Transactions (platform,itemid,dt,count,price) values (@platform,@itemid,@dt,@count,@price)

问题是，这个 DELETE 语句平均需要 5 秒。太长了。

我发现的第二个解决方案是使用 MERGE。我创建了这样的存储过程，其中采用表值参数：

CREATE PROCEDURE [dbo].[sp_updateTransactions]
@Table dbo.tp_Transactions readonly,
@itemId bigint,
@platform char(1)
AS
BEGIN
MERGE Transactions AS TARGET
USING @Table AS SOURCE  
ON (    
TARGET.[itemId] = SOURCE.[itemId] AND
TARGET.[platform] = SOURCE.[platform] AND 
TARGET.[dt] = SOURCE.[dt] AND 
TARGET.[count] = SOURCE.[count] AND
TARGET.[price] = SOURCE.[price] ) 


WHEN NOT MATCHED BY TARGET THEN 
INSERT VALUES (SOURCE.[itemId], 
                SOURCE.[dt],
                SOURCE.[count],
                SOURCE.[price],
                SOURCE.[platform])

WHEN NOT MATCHED BY SOURCE AND TARGET.[itemId] = @itemId AND TARGET.[platform] = @platform THEN 
DELETE;

END

对于包含 70k 记录的表，此过程大约需要 7 秒。所以对于 8M 来说可能需要几分钟。瓶颈是“当不匹配时” - 当我评论这一行时，该过程平均运行 0.01 秒。

那么问题来了：如何提高delete语句的性能呢？

需要删除以确保该表不包含在应用程序中删除的事务。但在实际场景中，这种情况很少发生，删除记录的真正需要少于 10000 个事务更新中的 1 个。

我的理论解决方法是创建额外的列，如“transactionDeleted bit”，并使用 UPDATE 而不是 DELETE，然后每 X 分钟或小时通过批处理作业进行表清理，并执行

delete from transactions where transactionDeleted=1

它应该更快，但我需要更新中的所有 SELECT 语句应用程序的其他部分，仅使用 transactionDeleted=0 记录，因此它也可能会影响应用程序性能。

您知道更好的解决方案吗？

更新：当前索引：

CREATE NONCLUSTERED INDEX [IX1] ON [dbo].[Transactions] 
(
[platform] ASC,
[ItemId] ASC
) WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF,   IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 50) ON [PRIMARY]


CONSTRAINT [IX2] UNIQUE NONCLUSTERED 
(
[ItemId] DESC,
[count] ASC,
[dt] DESC,
[platform] ASC,
[price] ASC
) WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

原文

I've table that contains some buy/sell data, with around 8M records in it:

CREATE TABLE [dbo].[Transactions](
[id] [int] IDENTITY(1,1) NOT NULL,
[itemId] [bigint] NOT NULL,
[dt] [datetime] NOT NULL,
[count] [int] NOT NULL,
[price] [float] NOT NULL,
[platform] [char](1) NOT NULL
) ON [PRIMARY]

Every X mins my program gets new transactions for each itemId and I need to update it. My first solution is two step DELETE+INSERT:

delete from Transactions where platform=@platform and itemid=@itemid
insert into Transactions (platform,itemid,dt,count,price) values (@platform,@itemid,@dt,@count,@price)
[...]
insert into Transactions (platform,itemid,dt,count,price) values (@platform,@itemid,@dt,@count,@price)

The problem is, that this DELETE statement takes average 5 seconds. It's much too long.

The second solution I found is to use MERGE. I've created such Stored Procedure, wchich takes Table-valued parameter:

CREATE PROCEDURE [dbo].[sp_updateTransactions]
@Table dbo.tp_Transactions readonly,
@itemId bigint,
@platform char(1)
AS
BEGIN
MERGE Transactions AS TARGET
USING @Table AS SOURCE  
ON (    
TARGET.[itemId] = SOURCE.[itemId] AND
TARGET.[platform] = SOURCE.[platform] AND 
TARGET.[dt] = SOURCE.[dt] AND 
TARGET.[count] = SOURCE.[count] AND
TARGET.[price] = SOURCE.[price] ) 


WHEN NOT MATCHED BY TARGET THEN 
INSERT VALUES (SOURCE.[itemId], 
                SOURCE.[dt],
                SOURCE.[count],
                SOURCE.[price],
                SOURCE.[platform])

WHEN NOT MATCHED BY SOURCE AND TARGET.[itemId] = @itemId AND TARGET.[platform] = @platform THEN 
DELETE;

END

This procedure takes around 7 seconds with table with 70k records. So with 8M it would probably take few minutes. The bottleneck is "When not matched" - when I commented this line, this procedure runs on average 0,01 second.

So the question is: how to improve perfomance of the delete statement?

Delete is needed to make sure, that table doesn't contains transaction that as been removed in application. But it real scenario it happens really rarely, ane the true need of deleting records is less than 1 on 10000 transaction updates.

My theoretical workaround is to create additional column like "transactionDeleted bit" and use UPDATE instead of DELETE, ane then make table cleanup by batch job every X minutes or hours and Execute

delete from transactions where transactionDeleted=1

It should be faster, but I would need to update all SELECT statements in other parts of application, to use only transactionDeleted=0 records and so it also may afect application performance.

Do you know any better solution?

UPDATE: Current indexes:

CREATE NONCLUSTERED INDEX [IX1] ON [dbo].[Transactions] 
(
[platform] ASC,
[ItemId] ASC
) WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF,   IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 50) ON [PRIMARY]


CONSTRAINT [IX2] UNIQUE NONCLUSTERED 
(
[ItemId] DESC,
[count] ASC,
[dt] DESC,
[platform] ASC,
[price] ASC
) WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦幻的味道 2024-12-14 12:31:43

好的，这还有另一种方法。对于类似的问题（当源不匹配时进行大扫描，然后删除），我将合并执行时间从 806 毫秒减少到 6 毫秒！

上述问题的一个问题是“WHEN NOT MATCHED BY SOURCE”子句正在扫描整个 TARGET 表。

这不是那么明显，但 Microsoft 允许在合并之前过滤 TARGET 表（通过使用 CTE）。因此，在我的例子中，TARGET 行从 250K 减少到不到 10 行。差别很大。

假设上述问题适用于由 @itemid 和 @platform 过滤的 TARGET，那么 MERGE 代码将如下所示。上述对索引的更改也将有助于此逻辑。

WITH Transactions_CTE (itemId
                        ,dt
                        ,count
                        ,price
                        ,platform
                        )
AS
-- Define the CTE query that will reduce the size of the TARGET table.  
(  
    SELECT itemId
        ,dt
        ,count
        ,price
        ,platform
    FROM Transactions  
    WHERE itemId = @itemId
      AND platform = @platform  
)  
MERGE Transactions_CTE AS TARGET
USING @Table AS SOURCE
    ON (
        TARGET.[itemId] = SOURCE.[itemId]
        AND TARGET.[platform] = SOURCE.[platform]
        AND TARGET.[dt] = SOURCE.[dt]
        AND TARGET.[count] = SOURCE.[count]
        AND TARGET.[price] = SOURCE.[price]
        )
WHEN NOT MATCHED BY TARGET  THEN
        INSERT
        VALUES (
            SOURCE.[itemId]
            ,SOURCE.[dt]
            ,SOURCE.[count]
            ,SOURCE.[price]
            ,SOURCE.[platform]
            )
WHEN NOT MATCHED BY SOURCE THEN
        DELETE;

OK, here is another approach also. For a similar problem (large scan WHEN NOT MATCHED BY SOURCE then DELETE) I reduced the MERGE execute time from 806ms to 6ms!

One issue with the problem above is that the "WHEN NOT MATCHED BY SOURCE" clause is scanning the whole TARGET table.

It is not that obvious but Microsoft allows the TARGET table to be filtered (by using a CTE) BEFORE doing the merge. So in my case the TARGET rows were reduced from 250K to less than 10 rows. BIG difference.

Assuming that the above problem works with the TARGET being filtered by @itemid and @platform then the MERGE code would look like this. The changes above to the indexes would help this logic too.

WITH Transactions_CTE (itemId
                        ,dt
                        ,count
                        ,price
                        ,platform
                        )
AS
-- Define the CTE query that will reduce the size of the TARGET table.  
(  
    SELECT itemId
        ,dt
        ,count
        ,price
        ,platform
    FROM Transactions  
    WHERE itemId = @itemId
      AND platform = @platform  
)  
MERGE Transactions_CTE AS TARGET
USING @Table AS SOURCE
    ON (
        TARGET.[itemId] = SOURCE.[itemId]
        AND TARGET.[platform] = SOURCE.[platform]
        AND TARGET.[dt] = SOURCE.[dt]
        AND TARGET.[count] = SOURCE.[count]
        AND TARGET.[price] = SOURCE.[price]
        )
WHEN NOT MATCHED BY TARGET  THEN
        INSERT
        VALUES (
            SOURCE.[itemId]
            ,SOURCE.[dt]
            ,SOURCE.[count]
            ,SOURCE.[price]
            ,SOURCE.[platform]
            )
WHEN NOT MATCHED BY SOURCE THEN
        DELETE;

回复收藏 0 原文

删除→记忆 2024-12-14 12:31:43

使用 IsDeleted（或许多人所做的 IsActive）的 BIT 字段是有效的，但它确实需要修改所有代码并创建一个单独的 SQL 作业来定期检查并删除“已删除”记录。这可能是可行的方法，但首先可以尝试一些不那么干扰的方法。

我注意到在你的 2 个索引集中，两个索引都不是聚集的。我可以假设 IDENTITY 字段是吗？您可能会考虑将 [IX2] UNIQUE 索引设置为 CLUSTERED 索引，并将 PK（再次假设 IDENTITY 字段是 CLUSTERED PK）更改为 NONCLUSTERED。我还会重新排序 IX2 字段，将 [Platform] 和 [ItemID] 放在第一位。由于您的主要操作是寻找 [Platform] 和 [ItemID] 作为一个集合，因此以这种方式对它们进行物理排序可能会有所帮助。由于该索引是唯一的，因此它是 CLUSTERED 的一个很好的候选者。这当然值得测试，因为这将影响针对表的所有查询。

另外，如果按照我的建议更改索引有帮助，那么仍然值得尝试这两种想法，因此也可以执行 IsDeleted 字段，看看这是否会进一步提高性能。

编辑：
我忘了提及，通过将 IX2 索引设为 CLUSTERED 并将 [Platform] 字段移至顶部，您应该摆脱 IX1 索引。

编辑2：

为了非常清楚，我建议如下：

CREATE UNIQUE CLUSTERED  INDEX [IX2]
(
[ItemId] DESC,
[platform] ASC,
[count] ASC,
[dt] DESC,
[price] ASC
)

公平地说，更改哪个索引是聚集的也可能会对在 [id] 字段上完成联接的查询产生负面影响，这就是您需要彻底测试的原因。最后，您需要针对最频繁和/或昂贵的查询调整系统，并且可能不得不接受某些查询会因此变慢，但这可能值得此操作更快。

Using a BIT field for IsDeleted (or IsActive as many people do) is valid but it does require modifying all code plus creating a separate SQL Job to periodically come through and remove the "deleted" records. This might be the way to go but there is something less intrusive to try first.

I noticed in your set of 2 indexes that neither is CLUSTERED. Can I assume that the IDENTITY field is? You might consider making the [IX2] UNIQUE index the CLUSTERED one and changing the PK (again, I assume the IDENTITY field is a CLUSTERED PK) to be NONCLUSTERED. I would also reorder the IX2 fields to put [Platform] and [ItemID] first. Since your main operation is looking for [Platform] and [ItemID] as a set, physically ordering them this way might help. And since this index is unique, that is a good candidate for being CLUSTERED. It is certainly worth testing as this will impact all queries against the table.

Also, if changing the indexes as I have suggested helps, it still might be worth trying both ideas and hence doing the IsDeleted field as well to see if that increases performance even more.

EDIT:
I forgot to mention, by making the IX2 index CLUSTERED and moving the [Platform] field to the top, you should get rid of the IX1 index.

EDIT2:

Just to be very clear, I am suggesting something like:

CREATE UNIQUE CLUSTERED  INDEX [IX2]
(
[ItemId] DESC,
[platform] ASC,
[count] ASC,
[dt] DESC,
[price] ASC
)

And to be fair, changing which index is CLUSTERED could also negatively impact queries where JOINs are done on the [id] field which is why you need to test thoroughly. In the end you need to tune the system for your most frequent and/or expensive queries and might have to accept that some queries will be slower as a result but that might be worth this operation being much faster.

回复收藏 0 原文