我应该使用 SSIS 或多线程 C# 应用程序将平面文件加载到数据库中吗?

发布于 2024-07-07 09:34:28 字数 731 浏览 8 评论 0原文

在 SQL Server Integration Services (SSIS) 中,可以设置与可容纳数百万条记录的平面文件的连接,并将这些数据推送到 SQL DB。 此外,可以通过引用和使用 Microsoft.SqlServer.Dts.Runtime 命名空间从 C# 应用程序调用此过程。

包含数百万条记录的平面文件最好使用 SSIS 运行,还是集体“您”更喜欢具有多个工作线程的 ac# 应用程序(一个用于读取行并将行添加到变量,一个用于从该变量写入数据库) ,以及管理这些线程的“母”类? (开发盒有两个CPU)

我已经看到了这个数据(sql团队博客) 指出对于一百万行的平面文件,SSIS 是最快的:

Process                Duration (ms)
--------------------   -------------
SSIS - FastParse ON         7322 ms 
SSIS - FastParse OFF        8387 ms 
Bulk Insert                10534 ms 
OpenRowset                 10687 ms 
BCP                        14922 ms

您有什么想法?

Within SQL Server Integration Services (SSIS) there is the ability to setup a connection to a flat file that can hold millions of records and have that data pushed to a SQL DB. Furthermore, this process can be called from a C# app by referencing and using the Microsoft.SqlServer.Dts.Runtime namespace.

Would a flat file with millions of records best be ran with SSIS, or would the collective "you" prefer a c# app with multiple worker threads(one to read and add the row to variable, one to write from that variable to the DB), and a "mother" class that manages those threads? (the dev box has two cpu's)

I have seen this data (sql team blog) stating that for a flat file with a million lines, SSIS is the fastest:

Process                Duration (ms)
--------------------   -------------
SSIS - FastParse ON         7322 ms 
SSIS - FastParse OFF        8387 ms 
Bulk Insert                10534 ms 
OpenRowset                 10687 ms 
BCP                        14922 ms

What are your thoughts?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

想念有你 2024-07-14 09:34:28

我只能代表我自己和我的经历。 我会选择 SSIS,因为这是您可能不必要地重新发明轮子的情况之一。 这是一个重复性的任务,SSIS 已经解决了。

我每天管理大约 57 项工作(DTS 和 SSIS 的组合)。 其中四个通常处理 5 到 1 亿条记录的导出。 我管理的数据库约有 20 亿行。 我使用脚本任务来附加日期(精确到毫秒),以便我可以每天运行作业多次。 这样做已经有大约 22 个月了。 太棒了!

还可以安排 SSIS 作业。 所以你可以设置它并忘记它。 我每天都会监控一切,但文件处理部分从未出现故障。

我唯一一次不得不求助于自定义 C# 程序是当我需要将非常大的文件拆分为较小的块时。 SSIS 对于这类事情来说太慢了。 使用脚本任务分割一个 1 gig 文本文件大约需要一小时。 C# 自定义程序在 12 分钟内处理完毕。

最后,只要使用你觉得舒服的就可以了。

I can only speak for myself and my experience. I would go with SSIS, since this is one of those cases where you might be re-inventing the wheel unnecessarily. This is a repetitive task that has already been solved by SSIS.

I have about 57 jobs (combination of DTS and SSIS) that I manage on a daily basis. Four of those routinely handle exporting between 5 to 100 million records. The database I manage has about 2 billion rows. I made use of a script task to append the date, down to the millisecond, so that I can run jobs several times a day. Been doing that for about 22 months now. It's been great!

SSIS jobs can also be scheduled. So you can set it and forget it. I do monitor everything every day, but the file handling part has never broken down.

The only time I had to resort to a custom C# program, was when I needed to split the very large files into smaller chunks. SSIS is dog slow for that sort of stuff. A one gig text file took about one hour to split, using the script task. The C# custom program handled that in 12 minutes.

In the end, just use what you feel comfortable using.

傻比既视感 2024-07-14 09:34:28

SSIS 的速度快得令人难以置信。 此外,如果需要重复发生某件事,您可以设置代理以按计划触发它。 自己编写是一回事,尝试使其成为多线程会比乍一看要复杂得多。

我十分之九会推荐 SSIS。

SSIS is incredibly fast. In addition, if it's something that needs to occur repeatedly, you can setup an agent to fire it off on schedule. Writing it yourself is one thing, trying to make it multithreaded gets a lot more complicated than it appears at first.

I'd recommend SSIS 9 times out of ten.

稳稳的幸福 2024-07-14 09:34:28

在这种情况下,我看不出使用多线程如何提高性能。 当传输大量数据时,主要瓶颈通常是磁盘 I/O。 生成多个线程并不能解决这个问题,我的猜测是,这会让事情变得更糟,因为它会在访问数据库的多个进程之间引入锁定争用。

I can't see how using multiple threads would help performance in this case. When transferring large volumes of data, the main bottleneck is usually disk I/O. Spawning multiple threads wouldn't solve this issue, and my guess would be that it would make things worse since it would introduce locking contention between the multiple processes hitting the database.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文