在SSIS中高效批量导入数据,偶尔PK重复内容?
经过一些转换后,我定期将包含 100k 记录的平面文件加载到表中。该表在两列上有 PK。数据总体上不包含重复的PK信息,但偶尔也会存在重复。
我天真地不明白为什么 SSIS 拒绝我的所有记录,而只有其中一些记录违反了 PK 约束。我认为问题在于,在批量加载期间,即使其中 1 行违反了 PK 约束,该批次中的所有行都会被拒绝。
如果我将 OLE Db 目标的 FastLoadMaxInsertCommitSize 属性更改为 1,则可以解决问题,但它会像狗一样运行,因为它每提交 1 行。
在 MySQL 中,批量加载功能允许您忽略 PK 错误并跳过这些行,而不会牺牲性能。有谁知道在 SQL Server 中实现此目的的方法吗?
非常感谢任何帮助。
Am regularly loading a flat file with 100k records into a table after some transformations. The table has a PK on two columns. The data on the whole does not contain duplicate PK information but occasionally, there are duplicates.
I naively didn't understand why SSIS was rejecting all my records when only some of them violated the PK constraint. I believe the problem is that during a bulk load, if even 1 of the rows violates the PK constraint, all rows in that batch get rejected.
If I alter the FastLoadMaxInsertCommitSize property of the OLE Db Destination to 1, if fixes the problem but it then runs like a dog as it's committing every 1 row.
In MySQL, the bulk load facility allows you to ignore PK errors and skip those rows without sacrificing performance. Does anyone know of a way to achieve this in SQL Server.
Any help much appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
听起来您可能正在寻找
IGNORE_DUP_KEY< /代码>
?
使用 IGNORE_DUP_KEY 选项处理重复值
It sounds like you may be looking for
IGNORE_DUP_KEY
?Using the IGNORE_DUP_KEY Option to Handle Duplicate Values
您可以将 FastLoadMaxInsertCommitSize 提高到 5k...这将大大加快插入速度。然后,设置错误输出以重定向行 - 在错误输出上,将包含错误行的一批 5k 行发送到另一个目标。 (下一位来自内存!)如果您将其设置为不快速加载,它将插入好的行,您可以将错误输出传递到错误表或行计数任务之类的东西。
您可以使用 FastLoadMaxInsertCommitSize 数字,直到找到最适合您的值。
You can up the FastLoadMaxInsertCommitSize to say 5k...this will speed up your inserts greatly. Then, set the Error Output to redirect the rows - on the error output from there, send a batch of 5k rows that contains an error row to another Destination. (This next bit is from memory!) If you set this up to not be fast load, it will then insert the good rows and you can pass the error output to an error table or something like a row count task.
You can play with the FastLoadMaxInsertCommitSize figures until you find something that works well for you.