跳过 SQL Server 中完全相同的记录

发布于 2024-11-15 07:34:57 字数 460 浏览 3 评论 0原文

我有一组文本文件,正在通过 C# 中的 SqBulkCopy 导入到 SQL Server 中。有些记录出现在多个记录中,我想跳过这些记录,而不是抛出错误并停止。

对于有明确“uniqueID”/主键的情况,一切都很好 - 我使用“WITH (IGNORE_DUP_KEY = ON)”效果良好。

但在这种情况下,无法确定这样的 uniqueID,除非将所有字段连接在一起,或者创建涉及所有字段的复合键。我试图获得一个唯一的 transactionID 或类似的,但没有一个可用。有时记录相同,但字段中的一位数字发生了变化;在这种情况下,我希望将其插入表中。

我认为可能有比针对所有字段创建密钥更好的方法?如果我这样做,那就意味着日期、字符串、值等都必须混合在一起才能形成键。

查询行并对值进行哈希处理以与输入行的哈希值进行比较是另一种想法,但我确信性能会很糟糕:)

有人对如何最好地做到这一点有建议吗?

谢谢

I have a set of text files I'm importing into SQL Server via a SqBulkCopy in C#. There are some records that appear in multiple records and I'd like to skip those rather than throwing an error and stopping.

For cases where there's a clear 'uniqueID'/primary key, all is well - I've used the "WITH (IGNORE_DUP_KEY = ON)" with good effect.

In this case though, no such uniqueID can be determined, short of joining ALL the fields together, or creating a compound key that involves all the fields. I've tried to get a unique transactionID or similar but none is available. Sometimes the records are the same but one digit in a field has changed; in this case I want it inserting into the table.

I figure there may be a better approach than creating a key against all fields? If I did do that, it would mean dates, strings, values etc would all have to be mixed in to form the key.

Querying the rows out and hashing the values for comparison with a hash of my input line was another thought, but performance would be terrible I'm sure :)

Anyone have suggestions on how best to do this?

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

牵你的手,一向走下去 2024-11-22 07:34:57

如果要导入,您只需创建一个阶段,通过插入与目标表具有相同架构的临时表来删除重复数据,然后对其执行 SELECT DISTINCT 查询,将此查询的结果插入到最终目标表中。除非您正在谈论数千万行,否则这应该可以正常工作,因为它只需要运行一次。

If you're importing, you can just make a stage which de-duplicates the data by inserting into a temporary table which has the same schema as your target table, and then executing a SELECT DISTINCT query on it, inserting the results of this query into your final destination table. Unless you are talking about tens of millions of rows, this should work fine, since it just needs to run once.

我最亲爱的 2024-11-22 07:34:57

你可以使用除了:

insert into TargetTable
select fieldlist from SourceTable
except
select fieldlist from TargetTable;

You could use EXCEPT:

insert into TargetTable
select fieldlist from SourceTable
except
select fieldlist from TargetTable;
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文