如何在使用 SSIS 导入数据库之前验证 CSV 文件?
我有包含三列的 CSV 文件。
sno sname quantity
--- ----- --------
1 aaa 23
2 bbb null
3 ccc 34
4 ddd ddd
5 eee xxx
6 fff 87
SQL Server数据库中的表如下/
CREATE TABLE csvtable
( sno int
, sname varchar(100)
, quantity numeric(5,2)
)
我创建了一个SSIS包来将csv文件数据导入到数据库表中。我在包执行期间收到错误,因为数量是字符串。我创建了另一个表来存储无效数据。
CREATE TABLE wrongcsvtable
( sno nvarchar(10)
, sname nvarchar(100)
, quantity nvarchar(100)
)
在 csvtable 中,我想存储以下数据。
sno sanme quantity
--- ------ --------
1 aaa 23
3 ccc 34
6 fff 87
在wrongcsvtable中,我想存储以下数据。
sno sanme quantity
--- ------ --------
2 bbb null
4 ddd ddd
5 eee xxx
有人可以指出我实现上述输出的正确方向吗?
I have CSV file with three columns.
sno sname quantity
--- ----- --------
1 aaa 23
2 bbb null
3 ccc 34
4 ddd ddd
5 eee xxx
6 fff 87
Table in the SQL Server database is as following/
CREATE TABLE csvtable
( sno int
, sname varchar(100)
, quantity numeric(5,2)
)
I created an SSIS package to import csv file data into the database table. I am getting an error during package execution because the quantity is a string. I created another table to store the invalid data.
CREATE TABLE wrongcsvtable
( sno nvarchar(10)
, sname nvarchar(100)
, quantity nvarchar(100)
)
In the csvtable, I would like to store the following data.
sno sanme quantity
--- ------ --------
1 aaa 23
3 ccc 34
6 fff 87
In the wrongcsvtable, I would like to store the following data.
sno sanme quantity
--- ------ --------
2 bbb null
4 ddd ddd
5 eee xxx
Could someone point me in the right direction to achieve the above mentioned output?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这是一种可能的选择。您可以使用
数据流任务
中的数据转换
转换来实现此目的。以下示例展示了如何实现这一点。该示例使用 SSIS 2005 和 SQL Server 2008 数据库。分步过程:
创建一个名为
FlatFile.CSV
的文件,并使用屏幕截图 #1 中所示的数据填充该文件。在 SQL 数据库中,使用 SQL 脚本 部分下提供的脚本创建两个名为
dbo.CSVCorrect
和dbo.CSVWrong
的表。表dbo.CSVWrong
中的字段应具有数据类型 VARCHAR 或 NVARCHAR 或 CHAR,以便可以接受无效记录。在 SSIS 包上,创建一个名为 SQLServer 的 OLE DB 连接以连接到 SQL Server 数据库,并创建一个名为 CSV 的平面文件连接。请参阅屏幕截图#2。配置平面文件连接 CSV,如屏幕截图 #3 - #7 所示。平面文件连接中的所有列应配置为字符串数据类型,以便包在读取文件时不会失败。
在包的“控制流”选项卡上,放置一个
数据流任务
,如屏幕截图 #8 所示。在包的“数据流”选项卡上,放置一个
平面文件源
并按屏幕截图 #9 和 #10 中所示进行配置.在包的“数据流”选项卡上,放置一个
数据转换
转换并对其进行配置,如屏幕截图#11所示。单击配置错误输出
并将错误和截断列值从失败组件更改为重定向行。请参阅屏幕截图 #12。在包的“数据流”选项卡上,放置一个
OLE DB 目标
,并将绿色箭头从数据转换连接到此 OLE DB目的地。配置 OLE DB 目标,如屏幕截图 #13 和 #14 所示。在包的“数据流”选项卡上,放置另一个
OLE DB 目标
并将红色箭头从数据转换连接到此 OLE DB目的地。配置 OLE DB 目标,如屏幕截图 #15 和 #16 所示。屏幕截图 #17 显示完全配置后的数据流任务。
屏幕截图 #18 显示包执行之前中的表中的数据。
屏幕截图 #19 显示数据流任务中的包执行。
屏幕截图 #20 显示包执行后表中的数据。
希望有帮助。
SQL 脚本:
屏幕截图 #1:
屏幕截图#2:
屏幕截图 #3:
屏幕截图 #4:
屏幕截图 #5:
屏幕截图 # 6:
屏幕截图 #7:
屏幕截图 #8:
屏幕截图 # 9:
屏幕截图 #10:
屏幕截图 #11:
屏幕截图#12:
屏幕截图 #13:
屏幕截图 #14:
屏幕截图 #15:
屏幕截图 #16:
屏幕截图 #17:
屏幕截图 #18:
屏幕截图 #19:< /strong>
屏幕截图 #20:
Here is one possible option. You can achieve this using the
Data Conversion
transformation within theData Flow Task
. Following example shows how this can be achieved. The example uses SSIS 2005 with SQL Server 2008 database.Step-by-step process:
Create a file named
FlatFile.CSV
and populate it with data as shown in screenshot #1.In the SQL database, create two tables named
dbo.CSVCorrect
anddbo.CSVWrong
using the scripts provided under SQL Scripts section. The fields in the tabledbo.CSVWrong
should have the data types VARCHAR or NVARCHAR or CHAR so that it can accept the invalid records.On the SSIS package, create an OLE DB connection named SQLServer to connect to SQL Server database and create a Flat File Connection named CSV. Refer screenshot #2. Configure the flat file connection CSV as shown in screenshots #3 - #7. All the columns in the flat file connection should be configured as string data type so that the package doesn't fail while reading the file.
On the Control Flow tab of the package, place a
Data Flow Task
as shown in screenshot #8.On the Data Flow tab of the package, place a
Flat File Source
and configure it as shown in screenshots #9 and #10.On the Data Flow tab of the package, place a
Data Conversion
transformation and configure it as shown in screenshot #11. Click on theConfigure Error Output
and change the Error and Truncation column values from Fail component to Redirect row. Refer screenshot #12.On the Data Flow tab of the package, place an
OLE DB Destination
and connect the green arrow from Data Conversion to this OLE DB Destination. Configure the OLE DB Destination as shown in screenshots #13 and #14.On the Data Flow tab of the package, place another
OLE DB Destination
and connect the red arrow from Data Conversion to this OLE DB Destination. Configure the OLE DB Destination as shown in screenshots #15 and #16.Screenshot #17 shows the Data Flow Task once it has been completely configured.
Screenshot #18 shows data in the tables before the package execution.
Screenshot #19 shows package execution within Data Flow Task.
Screenshot #20 shows data in the tables after the package execution.
Hope that helps.
SQL Scripts:
Screenshot #1:
Screenshot #2:
Screenshot #3:
Screenshot #4:
Screenshot #5:
Screenshot #6:
Screenshot #7:
Screenshot #8:
Screenshot #9:
Screenshot #10:
Screenshot #11:
Screenshot #12:
Screenshot #13:
Screenshot #14:
Screenshot #15:
Screenshot #16:
Screenshot #17:
Screenshot #18:
Screenshot #19:
Screenshot #20:
在数据流中放置条件分割。检查数量是否不完整。您创建的分支将转到wrongcsvtable,默认分支将转到csvtable
编辑忘记了没有数字测试有条件的分裂。您应该做的是添加一个派生列转换,将数量字段转换为整数。在“配置错误输出”对话框中,将错误和截断值设置为“忽略失败”。如果数据不是数字,这将传递该项目,并将新字段的值设置为 NULL。之后,在条件分割中,检查新字段是否为空。具有空字段的记录将转到错误的csv表,其他记录将转到csv表。
Put a conditional split in your data flow. Check if quantity is non-integral. The branch you've created will go to the wrongcsvtable and the default branch will go to the csvtable
EDIT Forgot there is no numeric test in the conditional split. What you should do is add a derived column transform that converts the quantity field into an integer. In the Configure Error Output dialog, set the error and truncation values to Ignore Failure. This will pass the item through with the value for the new field as NULL if the data is not numeric. After that, in the conditional split, check if the new field is null or not. Records with a null field go to the wrongcsvtable, other records go to the csvtable.