在本机 SQL 中批量提交大型 INSERT 操作?
我有几个大表(188m 和 144m 行),需要从视图填充,但每个视图包含几亿行(将伪维度建模数据合并到平面表单中)。每个表上的键是超过 50 个复合字节的列。如果数据在表中,我总是可以考虑使用 sp_rename 来创建另一个新表,但这并不是真正的选择。
如果我执行单个 INSERT 操作,该过程会使用大量事务日志空间,通常会将其归档并给 DBA 带来很多麻烦。 (是的,这可能是 DBA 应该处理/设计/架构师的工作)
我可以使用 SSIS 并通过批量提交将数据流式传输到目标表(但这确实需要通过网络传输数据,因为我们是不允许在服务器上运行 SSIS 包)。
除了使用某种键将进程划分为多个 INSERT 操作以将行分配到不同的批次并执行循环之外,还有什么其他方法吗?
I have a couple large tables (188m and 144m rows) I need to populate from views, but each view contains a few hundred million rows (pulling together pseudo-dimensionally modelled data into a flat form). The keys on each table are over 50 composite bytes of columns. If the data was in tables, I could always think about using sp_rename to make the other new table, but that isn't really an option.
If I do a single INSERT operation, the process uses a huge amount of transaction log space, typicalyl filing it up and prompting a bunch of hassle with the DBAs. (And yes, this is probably a job the DBAs should handle/design/architect)
I can use SSIS and stream the data into the destination table with batch commits (but this does require the data to be transmitted over the network, since we are not allowed to run SSIS packages on the server).
Any things other than to divide the process up into multiple INSERT operations using some kind of key to distribute the rows into different batches and doing a loop?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
该视图是否有任何类型的唯一标识符/候选键?如果是这样,您可以使用以下方法将这些行选择到工作表中:(
如果有意义,也许可以将此表放入不同的数据库中,也许使用简单恢复模型,以防止日志活动干扰您的主数据库。这应该生成无论如何,日志要少得多,并且您可以在恢复之前释放另一个数据库中的空间,以防问题是磁盘空间不足。)
然后您可以执行类似的操作,一次插入 10,000 行,并备份之间的日志:
这都是我的想法,所以不要剪切/粘贴/运行,但我认为总体思路就在那里。有关更多详细信息(以及为什么我在循环内备份日志/检查点),请参阅 sqlperformance.com 上的这篇文章:
请注意,如果您正在获取常规数据库和日志您可能需要进行完整的备份才能重新开始您的日志链。
Does the view have ANY kind of unique identifier / candidate key? If so, you could select those rows into a working table using:
(If it makes sense, maybe put this table into a different database, perhaps with SIMPLE recovery model, to prevent the log activity from interfering with your primary database. This should generate much less log anyway, and you can free up the space in the other database before you resume, in case the problem is that you have inadequate disk space all around.)
Then you can do something like this, inserting 10,000 rows at a time, and backing up the log in between:
That's all off the top of my head, so don't cut/paste/run, but I think the general idea is there. For more details (and why I backup log / checkpoint inside the loop), see this post on sqlperformance.com:
Note that if you are taking regular database and log backups you will probably want to take a full to start your log chain over again.
您可以对数据进行分区并将数据插入游标循环中。这与 SSIS 批量插入几乎相同。但在您的服务器上运行。
You could partition your data and insert your data in a cursor loop. That would be nearly the same as SSIS batchinserting. But runs on your server.
我知道这是一个旧线程,但我制作了 Arthur 游标解决方案的通用版本:
这解决了我们大型表负载的问题。
I know this is an old thread, but I made a generic version of Arthur's cursor solution:
This solved the problem on loads of our large tables.
没有仙尘,你知道的。
在不知道正在传输的实际模式的细节的情况下,通用解决方案将与您所描述的完全一样:将处理划分为多个插入并跟踪键。这是一种伪代码 T-SQL:
如果您想要允许并行批处理并对键进行分区,则会变得更加复杂。
There is no pixie dust, you know that.
Without knowing specifics about the actual schema being transfered, a generic solution would be exactly as you describe it: divide processing into multiple inserts and keep track of the key(s). This is sort of pseudo-code T-SQL:
It would get more complicated if you want to allow for parallel batches and partition the keys.
您可以使用 BCP 命令加载数据并使用批量大小参数
http: //msdn.microsoft.com/en-us/library/ms162802.aspx
两步处理
You could use the BCP command to load the data and use the Batch Size parameter
http://msdn.microsoft.com/en-us/library/ms162802.aspx
Two step process
这看起来像是 BCP 的工作。
This looks like a job for good ol' BCP.