从平面文件向 SQL Server 插入数百万条记录有哪些陷阱?

发布于 2024-07-05 12:10:58 字数 318 浏览 6 评论 0原文

我即将开始编写一个 Windows 窗体应用程序,该应用程序将打开一个以竖线分隔且大小约为 230 mb 的 txt 文件。 然后,该应用程序会将这些数据插入到 sql server 2005 数据库中(显然这需要快速发生)。 我在这个项目中使用 c# 3.0 和 .net 3.5。

我并不是要这个应用程序,只是在这里提供一些公共建议和潜在的陷阱建议。 从我收集到的网站来看,SQL 批量复制是先决条件,有什么我应该考虑的(我认为仅使用表单应用程序打开 txt 文件将是一项艰巨的任务;也许将其分解为 blob 数据?)。

谢谢,如果有人需要,我会编辑问题以使其清晰。

I am about to start on a journey writing a windows forms application that will open a txt file that is pipe delimited and about 230 mb in size. This app will then insert this data into a sql server 2005 database (obviously this needs to happen swiftly). I am using c# 3.0 and .net 3.5 for this project.

I am not asking for the app, just some communal advise here and potential pitfalls advise. From the site I have gathered that SQL bulk copy is a prerequisite, is there anything I should think about (I think that just opening the txt file with a forms app will be a large endeavor; maybe break it into blob data?).

Thank you, and I will edit the question for clarity if anyone needs it.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

凡尘雨 2024-07-12 12:10:59

如果文件的列格式与数据需要结束的目标表匹配,我更喜欢使用命令行实用程序bcp来加载数据文件。 它的速度非常快,您可以为任何无法插入的“奇怪”记录指定错误文件。

如果您需要存储其命令行参数(服务器、数据库、用户名/密码或可信连接、表、错误文件等),您的应用程序可以启动该命令。

与运行 BULK INSERT SQL 命令相比,我更喜欢这种方法,因为数据文件不需要位于数据库服务器可访问的系统上。 要使用批量插入,您必须指定要加载的数据文件的路径,因此它必须是运行加载的数据库服务器上的系统用户可见且可读的路径。 平时对我来说太麻烦了。 :-)

If the column format of the file matches the target table where the data needs to end up, I prefer using the command line utility bcp to load the data file. It's blazingly fast and you can specify and error file for any "odd" records that fail to be inserted.

Your app could kick off the command if you need to store the command line parameters for it (server, database, username / password or trusted connection, table, error file etc.).

I like this method better than running a BULK INSERT SQL command because the data file isn't required to be on a system accessible by the database server. To use bulk insert you have to specify the path to the data file to load, so it must be a path visible and readable by the system user on the database server that is running the load. Too much hassle for me usually. :-)

椵侞 2024-07-12 12:10:58

您可以使用 SSIS 进行读取和读取。 插入,但从 WinForms 应用程序中将其作为包调用。 然后你可以传入源、目标、连接字符串等作为参数/配置。

操作方法:http://msdn.microsoft.com/en-us/library/ aa337077.aspx

您可以在 SSIS 内设置转换和错误处理,甚至可以根据输入参数创建逻辑分支。

You could use SSIS for the read & insert, but call it as a package from your WinForms app. Then you could pass in things like source, destination, connection strings etc as parameter/configurations.

HowTo: http://msdn.microsoft.com/en-us/library/aa337077.aspx

You can set up transforms and error handling inside SSIS and even create logical branching based on input parameters.

耀眼的星火 2024-07-12 12:10:58

你所说的数据量实际上并没有那么大。 我不知道您的效率问题是什么,但是如果您可以等待几个小时来插入它,您可能会惊讶地发现,使用一种非常简单的技术(一次插入每一行)来完成这件事是多么容易。 一次将一千左右的行批处理在一起并将它们提交到 SQL Server 也可能会使其速度更快。

如果您不需要它尽可能快的话,这只是一个建议,可以为您节省一些认真的编程时间。 根据此导入必须运行的频率,节省几天的编程时间很容易就值得在其运行时等待几个小时。

The size of data you're talking about actually isn't that gigantic. I don't know what your efficiency concerns are, but if you can wait a few hours for it to insert, you might be surprised at how easy this would be to accomplish with a really naive technique of just INSERTing each row one at a time. Batching together a thousand or so rows at a time and submitting them to SQL server may make it quite a bit faster as well.

Just a suggestion that could save you some serious programming time, if you don't need it to be as fast as conceivable. Depending on how often this import has to run, saving a few days of programming time could easily be worth it in exchange for waiting a few hours while it runs.

烏雲後面有陽光 2024-07-12 12:10:58

你必须写一个winforms应用程序吗? 使用 SSIS 可能会更容易、更快。 有一些内置任务可用,尤其是批量插入任务

另外,值得检查SQL Server 2005 中的平面文件批量导入方法速度比较。

更新:如果您是 SSIS 新手,请查看其中一些网站以帮助您快速入门。 1) SSIS 控制流基础知识 2) SQL Server Integration Services 入门

这是将 Excel 文件导入 SQL 2005

Do you have to write a winforms app? It might be much easier and faster to use SSIS. There are some built-in tasks available especially Bulk Insert task.

Also, worth checking Flat File Bulk Import methods speed comparison in SQL Server 2005.

Update: If you are new to SSIS, check out some of these sites to get you on fast track. 1) SSIS Control Flow Basics 2) Getting Started with SQL Server Integration Services

This is another How to: on importing Excel file into SQL 2005.

云朵有点甜 2024-07-12 12:10:58

这将是一项流媒体努力。

如果可以的话,不要在这里使用交易。 交易成本实在是太大了。

因此,您要做的就是一次读取文件一行,然后一次将其插入一行。 您应该将失败的插入转储到另一个文件中,以便稍后进行诊断并查看失败的位置。

首先,我会继续尝试批量插入几百行,只是为了查看流式传输是否正常工作,然后您可以打开您想要的所有内容。

This is going to be a streaming endeavor.

If you can, do not use transactions here. The transactional cost will simply be too great.

So what you're going to do is read the file a line at a time and insert it in a line at a time. You should dump failed inserts into another file that you can diagnose later and see where they failed.

At first I would go ahead and try a bulk insert of a couple of hundred rows just to see that the streaming is working properly and then you can open up all you want.

我的影子我的梦 2024-07-12 12:10:58

您可以尝试使用SqlBulkCopy。 它允许您从“任何数据源”中提取数据。

You could try using SqlBulkCopy. It lets you pull from "any data source".

鹿港小镇 2024-07-12 12:10:58

顺便说一句,有时删除表的索引并在批量插入操作后重新创建它们会更快。

Just as a side note, it's sometimes faster to drop the indices of your table and recreate them after the bulk insert operation.

私藏温柔 2024-07-12 12:10:58

您可能会考虑从完全恢复切换到批量日志记录。 这将有助于使您的备份保持合理的大小。

You might consider switching from full recovery to bulk-logged. This will help to keep your backups a reasonable size.

找回味觉 2024-07-12 12:10:58

我完全推荐 SSIS,您可以读取数百万条记录并在相对较短的时间内清理它们。

您需要留出一些时间来掌握 SSIS,但这应该会有回报。 这里还有一些其他线程可能会很有用:

在 SQL Server(C# 客户端)中批量插入大量数据的最快方法是什么

SSIS 的推荐学习材料有哪些?

您还可以从 C# 创建包。 我有一个 C# 程序,它从遗留系统读取 3GL“主文件”(使用我为相关项目拥有的 API 解析为对象模型),采用包模板并修改它以生成 ETL 包。

I totally recommend SSIS, you can read in millions of records and clean them up along the way in relatively little time.

You will need to set aside some time to get to grips with SSIS, but it should pay off. There are a few other threads here on SO which will probably be useful:

What's the fastest way to bulk insert a lot of data in SQL Server (C# client)

What are the recommended learning material for SSIS?

You can also create a package from C#. I have a C# program which reads a 3GL "master file" from a legacy system (parses into an object model using an API I have for a related project), takes a package template and modifies it to generate a package for the ETL.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文