大型目标表中的 SSIS 文件加载速度太慢
这是我的第一个问题,我从不同的网站搜索了很多信息,但没有一个是结论性的。
问题: 每天,我都会加载一个平面文件,其中包含在 SQL Server 2005 中的计划作业中执行的 SSIS 包,但它花费了太多时间(例如 2 1/2 小时),并且该文件只有大约 300 行,文件大小约为 50 MB 。这让我发疯,因为它影响了我的服务器的性能。
这是场景: -我的包只是一个数据流任务,它有一个平面文件源和一个 OLE DB 目标,仅此而已! - 数据访问模式设置为“快速加载”。 -表中只有 3 个索引并且是非聚集的。 -我的目标表到目前为止有 366,964,096 条记录和 32 列 -我还没有在任何输出列中设置FastParse。(想先尝试其他东西)
所以我刚刚开始进行一些测试:
-重建/重新组织目标表中的索引(它们太分散了) ),但这对我没有多大帮助 -创建了另一个具有相同结构但没有所有索引的表,并使用 SSIS 包加载到这个新表来执行作业,并且只需要 1 分钟!
所以我很困惑,我错过了什么吗??? - SSIS包是否将所有大表写入缓冲区并将其写入磁盘?或者为什么时间上相差这么大?
-索引是否影响插入时间?
- 我是否应该将文件作为临时表加载到这个新表中,然后使用有序记录对目标表进行批量插入?因为我认为数据流任务比 BULK INSERT 快得多,但目前我不知道。
提前问候。
this is my first question, I've searched a lot of info from different sites but none of them where conslusive.
Problem:
Daily I'm loading a flat file with an SSIS Package executed in a scheduled job in SQL Server 2005 but it's taking TOO MUCH TIME(like 2 1/2 hours) and the file just has like 300 rows and its a 50 MB file aprox. This is driving me crazy, because is affecting the performance of my server.
This is the Scenario:
-My package is just a Data Flow Task that has a Flat File Source and an OLE DB Destination, thats all!!!
-The Data Access Mode is set to FAST LOAD.
-Just have 3 indexes in the table and are nonclustered.
-My destination table has 366,964,096 records so far and 32 columns
-I haven't set FastParse in any of the Output columns yet.(want to try something else first)
So I've just started to make some tests:
-Rebuild/Reorganize the indexes in the destination table(they where way too fragmented), but this didn't help me much
-Created another table with the same structure but whitout all the indexes and executed the Job with the SSIS package loading to this new table and IT JUST TOOK LIKE 1 MINUTE !!!
So I'm confused, is there something I'm Missing???
-Is the SSIS package writing all the large table in a Buffer and the writing it on Disk? Or why the BIG difference in time ?
-Is the index affecting the insertion time?
-Should I load the file to this new table as a temporary table and then do a BULK INSERT to the destination table with the records ordered? 'Cause I though that the Data FLow Task was much faster than BULK INSERT, but at this point I don't know now.
Greetings in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我可能会考虑的一件事是大表是否有任何触发器导致插入速度变慢。此外,如果聚集索引所在的字段需要在加载期间对数据进行大量重新排列,这也可能会导致问题。
在 SSIS 包中,使用合并联接(需要排序)可能会导致速度缓慢,但从您的描述来看,您似乎没有这样做。我只是在你这样做但没有提及的情况下才提到它。
One thing I might look at is if the large table has any triggers which are causing it to be slower on insert. Also if the clustered index is on a field that will require a good bit of rearranging of the data during the load, that could cause an issues as well.
In SSIS packages, using a merge join (which requires sorting) can cause slownesss, but from your description it doesn't appear you did that. I mention it only in case you were doing that and didn't mention it.
如果没有索引也能正常工作,也许你应该研究一下这些。数据类型有哪些?有多少人?也许你可以发布他们的定义?
您还可以查看索引的填充因子 - 特别是聚集索引。填充因子较高可能会导致插入上出现过多 IO。
If it works fine without the indexes, perhaps you should look into those. What are the data types? How many are there? Maybe you could post their definitions?
You could also take a look at the fill factor of your indexes - especially the clustered index. Having a high fill factor could cause excessive IO on your inserts.
好吧,我用另一个填充因子(80%)重建索引,就像 Sam 告诉我的那样,时间显着下降。花了 30 分钟而不是将近 3 个小时!
我将继续进行测试以微调数据库。另外,我不必创建聚集索引,我想使用聚集索引,时间会减少很多。
感谢大家,希望这对处于相同情况的人有所帮助。
Well I Rebuild the indexes with another fill factor (80%) like Sam told me, and the time droped down significantly. It took 30 minutes instead of almost 3hours!!!
I will keep with the tests to fine tune the DB. Also I didnt have to create a clustered index,I guess with the clustered the time will drop a lot more.
Thanks to all, wish that this helps to someone in the same situation.