如何减少批量 nvarchar(max) 更新的事务日志增长

发布于 2024-08-17 04:05:31 字数 804 浏览 5 评论 0原文

我们的应用程序需要向 SQL Server 2005 数据库添加大量文本(单个记录最多 1 GB)。出于性能原因,这是通过对每个块进行存储过程调用(例如 usp_AddChunk)来以块的形式完成的。 usp_AddChunk 没有任何显式事务。

我看到的是,将块大小从 100MB 减少到 10MB 会导致事务日志变得更大。有人告诉我这是因为每次调用 usp_AddChunk 时,“隐式”(我的术语)事务将记录所有现有文本。因此,对于 150MB 记录:

100MB 块大小:100(记录 0 字节)+ 50(记录 100 MB)= 记录 100 MB

将小于

10 MB 块大小:10(记录 0 字节)+ 10(记录 10 MB) + 10(20 MB 记录)... + 10(140 MB 记录)= 1050 MB 记录

我认为通过在我的 C# 代码中打开一个事务(在添加第一个块之前,并在最后一个块之后提交),这个“隐式”事务不会发生,并且我可以避免巨大的日志文件。但我的测试显示,使用 ADO.NET 事务时事务日志增长了 5 倍。

我不会发布代码,但这里有一些细节:

  1. 我调用 SqlConnection.BeginTransaction()
  2. 我为每个块使用不同的 SqlCommand
  3. 我将 (1) 中的 SqlTransaction 分配给每个 SqlCommand
  4. 我通常在每个 SqlCommand 执行后关闭连接,但我也尝试过不关闭连接,结果相同

这个方案有什么缺陷?如果您需要更多信息,请告诉我。谢谢!

注意:不能选择使用简单或批量日志恢复模型

Our app needs to add large amounts of text to SQL Server 2005 database (up to 1 GB for a single record). For performance reasons, this is done in chunks, by making a stored procedure call for each chunk (say, usp_AddChunk). usp_AddChunk does not have any explicit transactions.

What I'm seeing is that reducing the chunk size from 100MB to 10MB results in massively larger transaction logs. I've been told this is because each time usp_AddChunk is called, an "implicit" (my term) transaction will log all of the existing text. So, for a 150MB record:

100MB chunk size: 100 (0 bytes logged) + 50 (100 MB logged) = 100 MB logged

will be smaller than

10 MB chunk size: 10 (0 bytes logged) + 10 (10 MB logged) + 10 (20 MB logged) ... + 10 (140 MB logged) = 1050 MB logged

I thought that by opening a transaction in my C# code (before I add the first chunk, and commit after the last chunk), this "implicit" transaction would not happen, and I could avoid the huge log files. But my tests show the transaction log growing 5x bigger using the ADO.NET transaction.

I won't post the code, but here's a few details:

  1. I call SqlConnection.BeginTransaction()
  2. I use a different SqlCommand for each chunk
  3. I assign the SqlTransaction from (1) to each SqlCommand
  4. I usually close the connection after each SqlCommand execution, but I've also tried not closing the connection with the same results

What's the flaw in this scheme? Let me know if you need more info. Thanks!

Note: using a simple or bulk-logged recovery model is not an option

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

放血 2024-08-24 04:05:31

如果“块”的意思是:

UPDATE table
SET blob = blob + @chunk
WHERE key = @key;

那么您是对的,该操作已完全记录。您应该遵循 BLOB 使用指南并使用用于分块更新的 .Write 方法:

UPDATE table
SET blob.Write(@chunk, NULL, NULL)
WHERE key = @key;

这将最少记录更新(如果可能,请参阅 可以最少记录的操作):

UPDATE 语句已完全记录;
然而,对大的部分更新
使用 .WRITE 的值数据类型
子句的记录最少。

不仅这是最少记录的,而且因为更新是在 BLOB 末尾显式写入,所以引擎会知道您只更新了 BLOB 的一部分,并且只会记录该部分。当您使用 SET blob=blob+@chunk 进行更新时,引擎将看到整个 BLOB 已收到新值,并且不会检测到您实际上只是通过附加新数据更改了 BLOB 的事实,因此它会记录整个 BLOB(多次,正如您已经发现的)。

顺便说一句,你应该使用大小为 8040 倍数的块:

为了获得最佳性能,我们建议
数据被插入或更新
块大小是 8040 的倍数
字节。

If by 'chunks' you mean something like:

UPDATE table
SET blob = blob + @chunk
WHERE key = @key;

Then you are right that the operation is fully logged. You should follow the BLOB usage guidelines and use the .Write methods for chuncked updates:

UPDATE table
SET blob.Write(@chunk, NULL, NULL)
WHERE key = @key;

This will minimally log the update (if possible, see Operations That Can Be Minimally Logged):

The UPDATE statement is fully logged;
however, partial updates to large
value data types using the .WRITE
clause are minimally logged.

Not only that this is minimally logged, but because the update is an explicit write at the end of the BLOB, the engine will know that you only updated a portion of the BLOB and will only log that. When you update with SET blob=blob+@chunk te engine will see that the entire BLOB has received a new value and won't detect the fact that you really only changed the BLOB by appending new data, so the it will log the entire BLOB (several times, as you already found out).

BTW you should use chunks of size multiple of 8040:

For best performance, we recommend
that data be inserted or updated in
chunk sizes that are multiples of 8040
bytes.

·深蓝 2024-08-24 04:05:31

您可能需要做的是用它自己的事务包围每个“块”或块组,并在每个组之后提交。用您自己的 ADO 事务包围整个事务本质上与隐式事务执行相同的操作,因此这没有帮助。您必须提交较小的块以使日志更小。

What you may have to do is surround each "chunk" or group of chunks with it's own transaction and commit after each group. Surrounding the entire thing with your own ADO transaction is essentially doing the same thing as the implicit transaction does, so that won't help. You have to commit in smaller chunks to keep the log smaller.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文