将文件内容导入数据库的最快且最佳的方法
我的文件夹每天都会收到大约 3000 个新的 csv 文件,每个文件包含 50 到 2000 行信息。
目前,有一个流程可以一次选取一个文件,一次选取每一行,然后将其发送到存储过程以将内容插入到数据库中。
这意味着在一天的时间里,它可能很难在下一个 3000 个文件进入之前浏览完 3000 个文件!
我正在寻求改进这个过程,并有以下想法
- 使用 C# 4.0 的新并行功能允许同时处理多个文件,仍然将行逐一传递到存储过程
- 创建一个新的临时数据库表,其中所有文件中的行可以立即插入,然后对临时表中新添加的行调用存储过程。
- 将流程分为 2 个任务。一项作业将文件中的数据读取到临时数据库表中,另一项作业处理临时表中的行。
关于我如何考虑这样做还有其他想法吗?目前每个文件最多需要 20 秒,我真的很想大幅提高这方面的性能。
I have folders where approx 3000 new csv files come in on a daily basis, each containing between 50 and 2000 lines of information.
Currently, there is a process in place which picks these files up one at a time and takes each line one at a time and sends it to a stored procedure to insert the contents into a database.
This means that over the course of a day, it can struggle to get through the 3000 files before the next 3000 come in!
I'm looking to improve this process and had the following ideas
- Use new Parallel feature of C# 4.0 to allow multiple files to be processed at once, still passing through the lines one by one to the stored proc
- Create a new temporary database table where all the rows in the file can be inserted into at once then call the stored procedure on the newly added rows in the temp table.
- Split the process into 2 tasks. One job to read data from the files into the temporary database table, the other to process the rows in the temporary table.
Any other ideas on how I could look at doing this? Currently it can take up to 20 seconds per file, I'd really like to improve performance on this considerably.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
SQL Server 批量插入可能正是您所需要的
http://msdn.microsoft.com /en-us/library/ms188365.aspx
您可能会看到所有这些插入花费很长时间的另一个问题是每次添加一行时,您的表可能会重新索引。像这样的搜索将提供很多关于如何从当前程序中获得更好性能的好文章
http://www.google.com/search?q=sql+insert+performance
SQL Server Bulk Insert might be just what you need
http://msdn.microsoft.com/en-us/library/ms188365.aspx
Another issue you may be seeing with all of those inserts taking a long time is every time a row is added, your table may be getting reindexed. A search like this will give lots of good articles on ways to maybe get better performance out of your current procedure
http://www.google.com/search?q=sql+insert+performance
您可以使用 SQL Server 本机 BCP 实用程序。
有关 BCP 实用程序的更多信息,请访问:使用 bcp 导入和导出批量数据实用程序
您还可以查看:关于批量导入和批量导出操作
You can use SQL Server native BCP utility.
More info about BCP utility can be found here: Importing and Exporting Bulk Data by Using the bcp Utility
You can also take a look at: About Bulk Import and Bulk Export Operations
假设要导入的所有 3000 个文件每个都有 2000 行。即每天 600 万行。瓶颈可能不在于执行插入的客户端,而在于数据库本身。如果在相关表上启用了索引,则插入可能会很慢,具体取决于表的索引程度。哪些迹象使您得出这样的结论:数据库正在等待执行某些操作,并且导入例程滞后,而不是相反?
Let's say that all 3000 files to be imported have 2000 rows each. That's 6 million rows per day. The bottleneck might not be at the client doing the inserts, but with the database itself. If indexes are enabled on the table(s) in question, inserts could be slow, depending upon how heavily indexed the table(s) is/are. What indications have led you to conclude that it is the database which is waiting around for something to do and that it is the import routine that is lagging behind, rather than the other way around?
你说
(强调。)
这似乎意味着一行等于一笔交易。
解决这个问题。
可接受批量装载。
有效的 SQL INSERT 语句,并加载
他们那样。 (在单个事务中。)
我想这两个听起来都像“替换您的存储过程”。但真正的目的是减少交易数量。这两个选项中的任何一个都会将该流程的交易数量从每天 600 万笔(最坏情况)减少到每天 3000 笔。
You said
(Emphasis added.)
That seems to mean one line equals one transaction.
Fix that.
acceptable for bulk loading.
valid SQL INSERT statements, and load
them that way. (In a single transaction.)
I guess both of those sound like "replace your stored procedure". But the real point is to reduce the number of transactions. Either of those options would reduce the number of transactions for this process from 6 million a day (worst case) to 3000 a day.