使用 SqlBulkCopy 插入行
我正在将一些 Linq to Sql 代码切换为使用 SqlBulkCopy,问题是我需要将数千行插入两个表中。
该服务获取您批量的 10,000 个链接(从站点地图、反向链接构建器等导入),并将它们分成每个 feed 为 X 的 RSS feed 进行聚合。问题是,我已经有一个包含3200 万行的表。如果我正在执行 linq to sql 插入,则根据站点流量需要 5 到 10 分钟才能加载 10,000 个链接。
该结构非常基本。
Feed:Id bigint (PK)、标题 varchar(1000)、描述 varchar(1000)、发布日期时间、聚合日期时间 null、ShortCode varchar(8) [已过时,不再插入,但用于遗留data]
项目:Id bigint (PK)、FeedId bigint (FK)、标题 varchar(1000)、描述 varchar(1000)、发布日期时间、ShortCode varchar(8) [已过时,不再插入,但用于遗留数据],ShortId bigint null [插入后更新为等于Id(用于分区)]
FutureItems:Id bigint (PK)、FeedId bigint (FK)、Title varchar(1000)、说明 varchar(1000)、发布日期时间、ShortCode varchar(8) [已过时,不再插入,但用于遗留数据]、ShortId bigint null [插入后更新为相等的 Id(用于分区)]
OldItems:Id bigint (PK)、FeedId bigint (FK)、标题 varchar(1000)、描述 varchar(1000)、发布日期时间、ShortCode varchar(8) [已过时,不再插入,但用于遗留数据]、ShortId bigint null [插入后更新为等于 Id(用于分区)]
因此,如果 Feed 大小为 20,则将 500 次插入到 Feeds 表中,然后将 10000 次插入到 Items 表中,然后运行更新以设置 ShortId等于 ID。每晚运行一次作业,将数据分离到其他两个表中,并将未来的项目转移到 Items 表中。
我读到 SqlBulkCopy 可以在几分钟内完成 2000 万行,但我找不到任何在具有 FK 约束的多个表中执行此操作的好示例。
我们的 SQL 服务器对于这个应用程序来说是一个“怪物”。它是 SQL 2008 R2 Web、Windows 2008 R2 Enterprise、12GB RAM、双 4 核 Xeon @ 2.8ghz。
我们的网络服务器是一个克隆,没有数据库服务。
插入链接时CPU 运行大约85%,并且数据库填满RAM。
如果SqlBulkCopy不好,欢迎提出任何建议,我们有付费客户,他们很生气,而我不是DBA,只是一个普通的老程序员。
I am switching some of my Linq to Sql code to use SqlBulkCopy, and problem is I need to do two inserts of multiple thousands of rows into two tables.
The service takes your batch of 10,000 links (imported from sitemap, backlink builders, etc), and chops them into RSS feeds of X per feed for aggregation. Problem is, I already have a table of 32 million rows. If i am doing linq to sql inserts, it takes depending on site traffic anywhere between 5 and 10 mintues to load 10,000 links.
The structure is very basic.
Feeds: Id bigint (PK), Title varchar(1000), Description varchar(1000), Published datetime, Aggregated datetime null, ShortCode varchar(8) [antiquated, not inserted anymore, but used for legacy data]
Items: Id bigint (PK), FeedId bigint (FK), Title varchar(1000), Description varchar(1000), Published datetime, ShortCode varchar(8) [antiquated, not inserted anymore, but used for legacy data], ShortId bigint null [updated after insert to equal Id (used in partitioning)]
FutureItems: Id bigint (PK), FeedId bigint (FK), Title varchar(1000), Description varchar(1000), Published datetime, ShortCode varchar(8) [antiquated, not inserted anymore, but used for legacy data], ShortId bigint null [updated after insert to equal Id (used in partitioning)]
OldItems: Id bigint (PK), FeedId bigint (FK), Title varchar(1000), Description varchar(1000), Published datetime, ShortCode varchar(8) [antiquated, not inserted anymore, but used for legacy data], ShortId bigint null [updated after insert to equal Id (used in partitioning)]
So if you have a feed size of 20, you get 500 inserts into the Feeds table, then 10000 inserted into the Items table, then and update runs to set the ShortId equal to the Id. Once a night, a job runs that separates the data into the other two tables, and shift future items into the Items table.
I read that SqlBulkCopy can do 20 million rows in the matter of mintues, but I can't find any good examples of doing it into multiple tables with a FK Constraint.
Our SQL server is a "monster" especially for this application. It is SQL 2008 R2 Web, Windows 2008 R2 Enterprise, 12GB Ram, Dual 4 core Xeons @ 2.8ghz.
Our web server is a clone without the database service.
The CPU runs about 85% when inserting links, and the database fills the RAM.
If SqlBulkCopy isn't good, any suggestion is welcome, we have paying customers who are getting mad, and I am not a DBA, just a plain-old-programmer.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
SqlBulkCopy确实比普通插入要快。但速度更快,因为它可以将每秒运行 1000 次插入的作业转换为每秒运行 10000 次的作业。如果你只能在 10 分钟内完成 10000 个链接,那么你一定遇到了不同的问题,而批量复制不太可能解决这些问题。
您需要首先调查为什么插入 10000 个链接需要如此长的时间。只有在了解这一点之后,您才能进行调用来确定迁移到 SqlBulkCopy 是否是一个解决方案。我知道您不是 DBA,但我将向您提供一份“dbaish”白皮书,用于解决 SQL Server 性能问题:等待和队列。这不是一个千篇一律的解决方案,实际上是一种教您如何识别 SQL Server 性能瓶颈的方法。
并解决您的问题:当存在约束时如何使用 SqlBulkCopy?更普遍的问题是,当约束到位时,如何进行批量插入操作?对于严重的卷,我们实际上会禁用约束,执行批量上传,然后启用回约束。为了以最少的停机时间实现更简化的在线操作(在禁用约束的期间,数据库基本上处于“停机”状态),可以使用一种不同的策略,即将数据预加载到临时表中,对其进行验证,然后使用分区切换操作,请参阅使用分区切换高效传输数据。
SqlBulkCopy is indeed faster than ordinary inserts. But is faster as in it can transform a job that runs 1000 inserts per second into one that does 10000/sec. If you can only do 10000 links in 10 minutes, you must be having different problems, something that bulk copy is unlikely to solve.
You need to first investigate why it takes so incredibly long to insert 10000 links. Only after you understand that can you make a call that determines if moving to SqlBulkCopy is a solution. I understand that you are not a DBA, but I'm going to direct you a 'dbaish' white paper for troubleshooting SQL Server performance: Waits and Queues. This is a not a cookie cutter recipe solution, is actually a methodology that will teach you how to identify performance bottlenecks in SQL Server.
And to address your question: how does one use SqlBulkCopy when there are constraints? The more generic question is how does one do bulk insert operations when constraints are in place? For serious volumes one actually disables the constraints, performs the bulk uploads, then enables back the constraints. For more streamlined online operations with minimal downtime (the database is basically 'down' for the period when constraints are disabled) one use a different strategy, namely it pre-loads the data in staging tables, validates it and then switches it in with a partition switch operation, see Transferring Data Efficiently by Using Partition Switching.
我认为仅使用普通批量插入的真正问题是您需要其他表的初始插入中的提要 ID。这就是我要做的。使用批量插入插入到临时表。然后使用存储过程以基于集合的方式插入真实表。您可以在对 feed 表的初始插入中使用输出子句来获取一个表变量,其中包含插入其他表所需的 feed ID。
I think your real problem in just using a plain bulk insert is that you need the feed ids from the initial insert for the other tables. Here's what I would do. Use bulk insert to insert to a staging table. Then use a stored proc to do the inserts to the real table in a set-based fashion. You can use the output clause in the intial insert to the feed table to get back a table variable with the feed ids you need for the inserts to the other tables.