SQL导入跳过重复项
我正在尝试批量上传到 SQL Server 数据库。源文件有我想要删除的重复项,因此我希望该操作会自动上传第一个文件,然后丢弃其余文件。 (我设置了一个唯一的键约束)。问题是,当尝试重复上传时,整个事情都会失败并回滚。有什么办法可以让 SQL 继续运行吗?
I am trying to do a bulk upload into a SQL server DB. The source file has duplicates which I want to remove, so I was hoping that the operation would automatically upload the first one, then discard the rest. (I've set a unique key constraint). Problem is, the moment a duplicate upload is attempted the whole thing fails and gets rolled back. Is there any way I can just tell SQL to keep going?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
尝试将数据批量插入到临时表中,然后按照 @madcolor 建议或
WHERE 子句中的其他字段选择 DISTINCT。
Try to bulk insert the data to the temporary table and then SELECT DISTINCT as @madcolor suggested or
or other field in WHERE clause.
如果您通过 SQL Plus、DBVis 或 Toad 等 SQL 工具执行此操作,那么我怀疑不是。如果您使用某种语言以编程方式执行此操作,那么您需要分而治之。假设逐行执行更新并捕获每个异常的过程太长,因此您可以先对整个 SQL 块执行批处理操作,如果失败,则在前半部分执行,如果失败,在上半场的上半场做。以这种方式迭代,直到有一个成功的块。丢弃该块并对 SQL 的其余部分执行相同的过程。任何违反约束的内容最终都会成为您知道要记录和丢弃的唯一 SQL 语句。这应该通过尽可能多的批量处理来导入,同时仍然丢弃无效行。
If you're doing this through some SQL tool like SQL Plus or DBVis or Toad, then I suspect not. If you're doing this programatically in a language, then you need to divide and conquer. Presumably executing an update line by line and catching each exception would be too lengthy a process, so instead you could do a batch operation first on the whole SQL block, and if it fails, do it on the first half, and if that fails, do it on the first half of the first half. Iterate this way until you have a block that succeeds. Discard the block and do the same procedure on the rest of the SQL. Anything that violates a constraint will eventually end up as a sole SQL statement which you know to log and discard. This should import with as much bulk processing as is possible while still throwing out the invalid lines.
为此,请使用 SSIS。您可以告诉它跳过重复项。但首先要确保它们是真正的重复项。如果某些列中的数据不同怎么办,您如何知道哪一个记录应该保留更好?
Use SSIS for this. You can tell it to skip the duplicates. But first make sure they are true duplicates. What if the data in some of the columns is different, how do you know which is the better record to keep?