我应该使用 SqlBulkCopy 还是存储过程来导入数据
我有一个 248 MB 的日志文件,它可以扩展到 1 GB。所以你可以想象有多少行。我需要将所有行导入到 SQL Server 数据库的表中。为此,我首先创建一个数据表,并将日志文件中的所有行作为新行添加到该数据表中。这发生得很快。大约 30 秒内将超过一百万条记录添加到表中。在表格填满行后,我使用存储过程将数据表中的记录导入到数据库中。然而,这个阶段的执行量非常大。现在我想知道,我是否应该使用 SqlBulkCopy WriteToServer 方法,还是应该继续这种方式?如果 SqlBulkCopy 是更好的选择,那么我应该使用 DataTable 还是 IDataReader 版本。提前致谢。
I've got a log file which is 248 MB and it can extend up to a GB. So you can imagine how many rows there can be.I need to import all the rows into a table in an SQL Server database. For that I first create a DataTable and add all the lines in the log file into that DataTable as new rows. This happens pretty fast. More than a million records get added to the table in about 30 seconds. After the table is filled with the lines I then import the records in the DataTable to the database using stored procedure. However this phase executes very heavily. Now I wonder, if I should use SqlBulkCopy WriteToServer method instead or should I keep with this way? If SqlBulkCopy is a better choise then should I use DataTable or IDataReader version. Thanks in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我会使用 SqlBulkCopy 来导入任何像这样的实际卷的数据。与 SqlDataAdapter 的性能差异可能很大。例如,我博客作为性能导入 100K 行的比较:
SqlBulkCopy:1.5885s
SqlDataAdapter:25.0729s
如果您将 TableLock 选项与 SqlBulkCopy 一起使用,您可以获得更大的吞吐量,在我的测试中,导入时间会降低到 0.8229s。
还值得注意的是,使用 SqlBulkCopy,您可以让多个实例并行地将一段数据批量加载到同一个目标表中,而不会相互竞争。对另一个外部链接表示歉意,但我认为这是相关的。这是关于加载到堆表中,没有索引,以获得最佳性能,这可能不是您当前场景的选项,但绝对值得了解。
I would go with SqlBulkCopy for data imports of any real volume like this. The performance difference vs. SqlDataAdapter can be large. e.g. I blogged as performance comparison for importing 100K rows:
SqlBulkCopy: 1.5885s
SqlDataAdapter: 25.0729s
You can get even greater throughput if you use a TableLock option with SqlBulkCopy, which in my test took the import then down to 0.8229s.
It's also worth noting that with SqlBulkCopy, you can have multiple instances bulk loading a segment of data into the same destination table in parallel without them contending with each other. Apologies for another external link, but I think it's relevant. That's about loading into a heap table, no indexes, for optimal performance, which may not be an option for your current scenario but is definitely worth knowing of.