随着表大小的增加，表插入速率会减慢

发布于 2025-01-06 05:20:30 字数 904 浏览 0 评论 0原文

解析一些数据并将其插入 .NET 中的 3 个表中。使用表值参数传递数据，因为某些插入有 600,000 行。传递对象（不是 DataTables）并且它们通过引用传递（TVP 的本质）。与直接值插入相比，获得了 100:1 的增益，因为插入值一次限制为 1000 行。在存储过程中，从 TVP 到实际表的插入是按聚集索引排序的。这些表除了聚集索引外没有其他索引。 SP 采用 TABLOCK，因为这些是一次性写入表和一个数据加载器。填充因子 100。数据或事务日志大小不会增加 - 它的大小适合总数据负载。终于到问题了。在过去 4 小时内已插入 2 亿行。插入响应时间下降了 1/2。如果填充因子为 100 并且我插入按聚集索引排序，那么为什么响应会下降？我能做什么来解决这个问题？

我直到使用它才得到 TVP - 它就像一个反向 DataReader。

我要感谢您的帮助，并对不正确的问题陈述表示歉意。对于每个解析（在本例中我解析 200,000 个），插入内容均按聚集索引排序。然而，这 3 个表中只有 1 个是按聚集索引顺序作为整体进行下一个解析。解析 70,000 个后，好的表的扫描密度为 99%，但其他两个表碎片严重，扫描密度为 12%。

将两个碎片表的填充因子设置为 50 并重新索引。现在我的速度大约是最大速度的 1/3。我只需要停止该过程并每隔几个小时重新索引一次。

我最终做的是更改聚集索引以匹配插入顺序。在原来的集群上创建了一个唯一索引。我禁用唯一索引，插入数据，然后重建唯一索引。在这种情况下，我在 10 小时的跑步中获得了 300:1 的性能。这并不是额外的0-300比1。这不是捏造的 - 与从有序索引和填充因子或 30 开始相比。即使有额外的索引，我的表大小也较小，因为我可以将两个填充因子设置为 100。

我在某些查询上使用 #temp，这样我就可以按照只有查询知道的顺序获取行。我将 #temp 转换为 TVP 并获得了 1/2 秒（大约是创建和删除 #temp 所需的时间）。

原文

Parsing some data and inserting it into 3 tables from .NET. Using Table Valued Parameters to pass the data as some inserts are 600,000 rows. Passing objects (not DataTables) and they are passed by reference (nature of TVP). Got 100:1 gain over straight value insert as insert value is limited to 1000 rows at a time. In the stored procedure the insert to the actual table from the TVP is sorted by the clustered index. These tables have no index other than the clustered index. The SP takes a TABLOCK as these are write once tables and one data loader. Fill Factor 100. No increase in data or transaction log size - it is sized for the total data load. Finally to the question. In the last 4 hours have inserted 200 million rows. The insert response time has dropped in 1/2. If the fill factor is 100 and I am inserting sorted by clustered index then why the drop in response? What can I do fix this?

I did not get TVP until I used it - it is like a reverse DataReader.

I would like to thank you for your help and apologize for an incorrect problem statement. For each parse (in this case I am parsing 200,000) the insert is sorted by the clustered index. However for only 1 of the 3 tables is the next parse as a whole in clustered index order. After parsing 70,000 the good table has a scan density of 99% but the other two tables are severely fragmented with a scan density of 12%.

Set a fill factor of 50 on the two fragmented tables and re-indexed. Now I am getting about 1/3 the max speed. I will just need to stop the process and re-index every few hours.

What I ended up doing is changing the clustered index to match insert order. Created a unique index on what used to be the clustered. I disable the unique, index insert the data, and then rebuild the unique index. With that scenario I get a 300:1 performance on a 10 hour run. That is not an extra 0 - three hundred to one. And that is not fudging - compared to starting with the index ordered and a fill factor or 30. Even with the extra index my table size is smaller as I can have both fill factors at 100.

I use #temp on some queries so I can get the rows in an order only known to the query. I converted the #temp to a TVP and gained 1/2 second (about the time it take to create and delete a #temp).

分享到QQ

分享到微博