SSIS =- 如何根据记录类型(标题、详细信息、尾部)处理具有可变列数的 CSV 输入文件
我需要能够处理一个输入文件,其中有 3 种类型的记录,由第一列中的值 RecordType 指定:
H=header record
D=Detail
T=Trailer
我想我想将所有详细记录传输到目标数据库表中。我认为标头和尾部记录(其中包含指定文件创建时间的控制信息(标头信息)和记录计数/校验和信息(在尾部中))也应该传输到单独的表中。
H,2011-06-02
D,value1,value2,value3,1
D,value1,value2,value3,2
D,value1,value2,value3,3
D,value1,value2,value3,4
T,4, 10
作为一个SSIS新手,我能想到的唯一方法是将输入文件处理为3列的固定长度文件。 第 1 列 = 第 1-1 位 col 2 = Pos 2-2(逗号分隔符,一次性列) col 3 = 数据
然后,
- 将数据传输到 3 列表中。
- 根据记录类型将数据导出到新文件。
- 作为单独的文件重新处理
我希望不必这样做,因为这很丑陋。
我的首选方法是调整格式如下。 (我的逗号计数可能不对,但您可能会明白 -
"RecType","CreateDate","Value1","Value2","Value3","Value4","RowCount","CheckSum"
H,2011-06-02,,,,,,,,
D,,value1,value2,value3,1,,
D,,value1,value2,value3,1,,
D,,value1,value2,value3,1,,
T,,,,,,,,4,10
不幸的是,更改文件格式不是一个选项。
I need to be able to process an input file where there are 3 types of records designated by the value in the 1st column, RecordType:
H=header record
D=Detail
T=Trailer
I suppose that I would like to transfer all of the detail records into a dest db table. I suppose the header and trailer records, which contain control info specifying when the file was created (header info) and record count/check sum info (in the trailer) should be transferred into separate tables, too.
H,2011-06-02
D,value1,value2,value3,1
D,value1,value2,value3,2
D,value1,value2,value3,3
D,value1,value2,value3,4
T,4, 10
being an SSIS rookie, the only way I can think of, is to process the input file as a fixed length file of 3 columns.
col 1 = Pos 1-1
col 2 = Pos 2-2 (the comma separator, a throw-away column)
col 3 = the data
Then ,
- transfer the data into a 3 column table.
- Export the data to new files based on record type.
- Reprocess as separate file
I hope not to have to do this, as this is ugly.
My preferred approach would be to adjust the format as follows. (My comma counts may be off, but you probably get the idea-
"RecType","CreateDate","Value1","Value2","Value3","Value4","RowCount","CheckSum"
H,2011-06-02,,,,,,,,
D,,value1,value2,value3,1,,
D,,value1,value2,value3,1,,
D,,value1,value2,value3,1,,
T,,,,,,,,4,10
Unfornately, changing the file format is not an option.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
不幸的是,这是一个常见问题。我发现的唯一方法是使用单个 varchar 列配置平面文件连接管理器。将整行放入字符串列中。
然后使用条件分割来区分页眉/页脚/数据行,并将每种类型定向到单独的输出
然后您需要将数据写入文件,然后对其执行另一个连接管理器并处理它,或者使用派生列使用字符串操作、字符串拆分或子字符串等将单个字符串(varchar)拆分为多个列
Unfortunately this is a common problem. And only way i've found around it is to configure a flat file connection manager with a single varchar column. Put the entire row into the string column.
Then use a conditional split to distingush between header/footer/data row, and direct each type to separate outputs
You then need to either write the data to a file and then do another connection manager to it and process it, or use a derived column to split up the single string (varchar) into the multiple columns using string manipulation, string split or substring etc etc
您可以配置连接管理器将文件导入为单行,然后添加条件拆分以根据记录类型拆分记录。
然后,您可以为分割转换的每种类型的输出添加脚本组件。这些脚本组件可用于将输入拆分为所需的输出列。然后,您可以添加单独的目标以将记录插入到不同的表中。
You can configure the connection manager to import the file as a single row and then add conditional split to split the records based on record-type.
Then you can add script components for each type of output from split transform. These script components can be used to split the input into required output columns. Then, you can add separate destinations to insert records into different tables.