循环检查数据库中数据是否存在的最佳实践?

发布于 2024-10-14 16:02:05 字数 152 浏览 10 评论 0原文

我需要检查特定数据是否存在于数据库的 table1 中或者不在 for 循环中。如果它存在,则不执行任何操作,for 循环将继续,否则我应该将数据添加到 table1 中。

因此,在每次迭代中,我都会查看数据库。我相信这很耗时。

执行此类任务是否有最佳实践?

I need to check that a specific data exists in table1 in database or not within a for loop. If it exists then no action and for loop continues, otherwise I should add data to table1.

So, in every iteration, I take a look at database. I Believe that it's time-consuming.

Is there any best practice for doing such these tasks?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

靖瑶 2024-10-21 16:02:05

如何验证数据库表中是否存在记录?您很可能会将其与本地 ID 或其他内容进行匹配。

如果这是真的,那么我将查询表并选择所有 Id,将它们存储在哈希表(.Net 中的Dictionary)中。 (如果您的数据库包含数百万条记录,这可能不切实际)。现在确定表中的记录是否存在只需检查字典中的键是否存在,这是一个 O(log n) 操作,因此比 O(n) 昂贵的数据库往返要好得多。

接下来要考虑的是如何记住需要添加到表中的记录。这取决于您本地是否可能有重复记录,您想要检查是否应该添加它们或者是否保证它们不包含(本地)重复项。

在没有可能重复的简单情况下,只需将它们添加到适当键的字典中,然后查询 Dictionary.Values ,这可能是 O(1) 最快的。如果您需要非常快的插入,因为它们很大,请考虑使用 SQL 批量插入。

如果您的表太大而无法在本地缓存 Id,我会考虑实现一个存储过程来执行插入,并具有决定是实际执行插入还是不执行任何操作的逻辑。这将消除通常非常昂贵的第二次往返。

如果您的 RDBMS 实现 SQL Merge 命令(假设您使用 MS SQL Server ,确实如此),我将所有数据插入临时表中,然后将其与目标表合并。这可能是最快的解决方案。

How do you verify existence of a record in your database table? Most likely you match it against a local Id or something.

If this is true, then I'd query the Table and select all Id's, storing them in a Hashtable (Dictionary in .Net). (This might not be practical if your database contains millions of records). Determining whether a record in the table exists now is a simple matter of checking if a key in the Dictionary exists, which is a O(log n) operation and so a lot better than O(n) expensive database roundtrips.

The next thing to think about is how to remember the records you need to add to the table. This depends on whether you may have duplicate records locally that you want to check if they should be added or if they are guaranteed not to contain (local) duplicates.

In the simple case where there are no possible duplicates, just adding them to the Dictionary at the appropriate key and then later querying Dictionary.Values which is O(1) is probably as fast as it gets. If you need the inserts to be really fast because they are massive, consider using SQL Bulk Inserts.

If your table is too large to cache the Id's locally, I'd consider implementing a stored procedure for doing the insert and have the logic that decides whether to actually perform an insert or just do nothing there. This will get rid of the second roundtrip, which is usually pretty expensive.

If your RDBMS implements the SQL Merge command (assuming your using MS SQL Server, it does), I'd insert all data in a temporary table and then Merge it with the target table. This is probably the fastest solution.

怀里藏娇 2024-10-21 16:02:05

多少数据以及什么 SQL 实现可以在这里产生很大的影响...

例如,对于 1000 万行数据,进行 1000 万(可能记录)操作,每行一个操作将比示例长几个数量级:

  • 在批量操作中将相同的数据上传到临时表,例如,如果您使用 SQL,则通过批量复制 API。
  • 执行左外连接来比较数据,
  • 在单个批处理操作中插入差异。

How much data, and what SQL implementation can make a big difference here...

For example, with 10 million rows of data, making 10 million (potentially logged) operations, one for each row will take orders of magnitudes longer a than for example:

  • uploading the same data to a temporary table in a bulk-operation e.g. through bulk-copy API if you're using SQL.
  • performing a left-outer-join to diff the data
  • insert the difference in a single batch operation.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文