使用 SSIS 将平面文件数据传输到多个表
我第一次在 SSIS 中执行我的任务时,
我有 4.5 GB 的数据,我想上传到 Sql Server 2008。 我的数据就像4到6个数据表的组合,原始数据中没有主键。
一行包含多个信息表,
现在我需要将该数据拆分到各自的表中。
我的数据是这样的
row1: col1, col2, col3...........col125
现在我必须将一些列插入到主表中,如果我将记录插入到主表中,我必须得到最后插入的行 id 并使用该 id 我必须将原始数据列(col5 到 col20 等)插入到另一个表中,例如..
最后插入的行是 5
表 1 5、第2栏 5、第3栏 5、第4栏 5 ,col5 并且必须插入另一个表
table 3 5、第 12 栏 5、第32栏 5、45 栏 5、col55
表4 5、72 号栏 5、第82栏 5、95号栏 5 ,col105 就像第一行一样。
任何人都可以建议我如何执行这项任务。请参阅附加文件 (http ://www.bidn.com/Assets/Uploaded-CMS-Files/fc8b892d-8652-4f0e-bdc6-56e297149315Table Extract.pdf)
First time i am paling to implement my task in SSIS
I am having data like 4.5 gB ,i would like to upload into the Sql Server 2008 .
my data is like combination of 4to 6 tables of data, there is no Primary key in raw data .
one row containing the multiple tables of information
now i need to split that data in to respective tables.
my data is like this
row1: col1, col2, col3...........col125
Now i have to insert some columns in to master table, if i insert the records in to the master table i have to get last inserted row id and using that id i have to insert coloumns of raw data(col5 to col20 and so on) in to another tables like..
last inserted row is 5
table 1
5, col2
5 ,col3
5, col4
5 ,col5 and has to insert another table
table 3
5, col12
5 ,col32
5, col45
5 ,col55
table 4
5, col72
5 ,col82
5, col95
5 ,col105 like that from first row.
can any one suggest me how to implement this task. Please see the attaced file (http://www.bidn.com/Assets/Uploaded-CMS-Files/fc8b892d-8652-4f0e-bdc6-56e297149315Table Extract.pdf)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我建议使用带有数据流的 SSIS 包,该包仅将源文件上传到文件的暂存表版本中。此时,我的包将调用一个执行 SQL 任务,该任务将触发一个非常大的存储过程来完成您需要执行的所有主/子工作。
如果性能不是非常重要,则在存储过程中,创建一个游标来读取每一行,然后执行插入表 1 的操作。捕获scope_identity() 值(在示例中为 5)并使用该值和剩余的值要插入到表 2-6 中的列。起泡沫,冲洗,重复。
更好的性能方法是对该数据执行基于集合的操作。这种模式需要更多的技巧才能实现,但完全可行。将所有行插入表 1 中,而不是使用游标。要捕获所有插入的标识值,您需要使用 OUTPUT 子句转换为类似表变量的内容。此时,您已在输入行之间建立了映射(行 1 = id 5、行 2 = id 6 等),并且可以在基于游标的方法中执行相同的插入,除非您使用列而不是变量。
如果您绝对必须全程使用 SSIS,我对您表示哀悼。可以在那里应用光标模式并获得类似的性能结果。不要加载到临时表,而是加载到数据流中对象类型的 SSIS 变量。在包中创建 126 个或任何变量,然后粉碎记录集
I would propose an SSIS package with a Dataflow that merely uploads the source file into a staging table version of the file. At that point, my package would then call an Execute SQL Task which would fire a very large stored procedure to do all the master/child work you will need to do.
If performance is not terribly important, within the stored procedure, create a cursor to read each row, by agonizing row, and perform your insert into table 1. Capture the scope_identity() value (5 in your example) and use that and the remaining columns to insert into tables 2-6. Lather, rinse, repeat.
A better performance approach would be to perform set based operations on that data. That pattern will require more skill to pull off but is entirely feasible. Instead of a cursor, insert all of the rows into table 1. To capture all of the inserted identity values, you will want to use the OUTPUT clause into something like a table variable. At that point, you have a mapping between input rows (row 1 = id 5, row 2 = id 6, etc) and can perform the same inserts in the cursor based approach except you'd use columns instead of variables.
If you absolutely must use SSIS the entire way, you have my condolences. The cursor pattern could be applied there with similar performance results. Instead of loading to a staging table, load to an SSIS variable of type object in your data flow. Create 126 or whatever variables in the package and then shred the recordset