如何减少批量 nvarchar(max) 更新的事务日志增长
我们的应用程序需要向 SQL Server 2005 数据库添加大量文本(单个记录最多 1 GB)。出于性能原因,这是通过对每个块进行存储过程调用(例如 usp_AddChunk)来以块的形式完成的。 usp_AddChunk 没有任何显式事务。
我看到的是,将块大小从 100MB 减少到 10MB 会导致事务日志变得更大。有人告诉我这是因为每次调用 usp_AddChunk 时,“隐式”(我的术语)事务将记录所有现有文本。因此,对于 150MB 记录:
100MB 块大小:100(记录 0 字节)+ 50(记录 100 MB)= 记录 100 MB
将小于
10 MB 块大小:10(记录 0 字节)+ 10(记录 10 MB) + 10(20 MB 记录)... + 10(140 MB 记录)= 1050 MB 记录
我认为通过在我的 C# 代码中打开一个事务(在添加第一个块之前,并在最后一个块之后提交),这个“隐式”事务不会发生,并且我可以避免巨大的日志文件。但我的测试显示,使用 ADO.NET 事务时事务日志增长了 5 倍。
我不会发布代码,但这里有一些细节:
- 我调用 SqlConnection.BeginTransaction()
- 我为每个块使用不同的 SqlCommand
- 我将 (1) 中的 SqlTransaction 分配给每个 SqlCommand
- 我通常在每个 SqlCommand 执行后关闭连接,但我也尝试过不关闭连接,结果相同
这个方案有什么缺陷?如果您需要更多信息,请告诉我。谢谢!
注意:不能选择使用简单或批量日志恢复模型
Our app needs to add large amounts of text to SQL Server 2005 database (up to 1 GB for a single record). For performance reasons, this is done in chunks, by making a stored procedure call for each chunk (say, usp_AddChunk). usp_AddChunk does not have any explicit transactions.
What I'm seeing is that reducing the chunk size from 100MB to 10MB results in massively larger transaction logs. I've been told this is because each time usp_AddChunk is called, an "implicit" (my term) transaction will log all of the existing text. So, for a 150MB record:
100MB chunk size: 100 (0 bytes logged) + 50 (100 MB logged) = 100 MB logged
will be smaller than
10 MB chunk size: 10 (0 bytes logged) + 10 (10 MB logged) + 10 (20 MB logged) ... + 10 (140 MB logged) = 1050 MB logged
I thought that by opening a transaction in my C# code (before I add the first chunk, and commit after the last chunk), this "implicit" transaction would not happen, and I could avoid the huge log files. But my tests show the transaction log growing 5x bigger using the ADO.NET transaction.
I won't post the code, but here's a few details:
- I call SqlConnection.BeginTransaction()
- I use a different SqlCommand for each chunk
- I assign the SqlTransaction from (1) to each SqlCommand
- I usually close the connection after each SqlCommand execution, but I've also tried not closing the connection with the same results
What's the flaw in this scheme? Let me know if you need more info. Thanks!
Note: using a simple or bulk-logged recovery model is not an option
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果“块”的意思是:
那么您是对的,该操作已完全记录。您应该遵循 BLOB 使用指南并使用用于分块更新的 .Write 方法:
这将最少记录更新(如果可能,请参阅 可以最少记录的操作):
不仅这是最少记录的,而且因为更新是在 BLOB 末尾显式写入,所以引擎会知道您只更新了 BLOB 的一部分,并且只会记录该部分。当您使用 SET blob=blob+@chunk 进行更新时,引擎将看到整个 BLOB 已收到新值,并且不会检测到您实际上只是通过附加新数据更改了 BLOB 的事实,因此它会记录整个 BLOB(多次,正如您已经发现的)。
顺便说一句,你应该使用大小为 8040 倍数的块:
If by 'chunks' you mean something like:
Then you are right that the operation is fully logged. You should follow the BLOB usage guidelines and use the .Write methods for chuncked updates:
This will minimally log the update (if possible, see Operations That Can Be Minimally Logged):
Not only that this is minimally logged, but because the update is an explicit write at the end of the BLOB, the engine will know that you only updated a portion of the BLOB and will only log that. When you update with
SET blob=blob+@chunk
te engine will see that the entire BLOB has received a new value and won't detect the fact that you really only changed the BLOB by appending new data, so the it will log the entire BLOB (several times, as you already found out).BTW you should use chunks of size multiple of 8040:
您可能需要做的是用它自己的事务包围每个“块”或块组,并在每个组之后提交。用您自己的 ADO 事务包围整个事务本质上与隐式事务执行相同的操作,因此这没有帮助。您必须提交较小的块以使日志更小。
What you may have to do is surround each "chunk" or group of chunks with it's own transaction and commit after each group. Surrounding the entire thing with your own ADO transaction is essentially doing the same thing as the implicit transaction does, so that won't help. You have to commit in smaller chunks to keep the log smaller.