SQL Server - 合并大表而不锁定数据

发布于 2024-09-10 16:13:07 字数 307 浏览 9 评论 0原文

我有一个非常大的数据集(约 300 万条记录),需要按每日计划将更新和新记录合并。我有一个存储过程,它实际上将记录集分解为 1000 个记录块,并对临时表使用 MERGE 命令,以避免在数据更新时锁定活动表。问题是它并没有完全帮助。该表仍然“锁定”,并且我们使用该数据的网站在尝试访问该数据时会出现超时。我什至尝试将其分成 100 个记录块,甚至尝试了 WAITFOR DELAY '000:00:5' 以查看它是否有助于在合并块之间暂停。依然是比较迟缓。

我正在寻找有关如何在不锁定表的情况下合并大型数据集的任何建议、最佳实践或示例。

谢谢

I have a very large set of data (~3 million records) which needs to be merged with updates and new records on a daily schedule. I have a stored procedure that actually breaks up the record set into 1000 record chunks and uses the MERGE command with temp tables in an attempt to avoid locking the live table while the data is updating. The problem is that it doesn't exactly help. The table still "locks up" and our website that uses the data receives timeouts when attempting to access the data. I even tried splitting it up into 100 record chunks and even tried a WAITFOR DELAY '000:00:5' to see if it would help to pause between merging the chunks. It's still rather sluggish.

I'm looking for any suggestions, best practices, or examples on how to merge large sets of data without locking the tables.

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

鹿港巷口少年归 2024-09-17 16:13:07

在执行选择时,将前端更改为使用 NOLOCK 或 READ UNCOMMITTED。

您不能 NOLOCK MERGE、INSERT 或 UPDATE,因为必须锁定记录才能执行更新。但是,您可以对选择进行 NOLOCK。

请注意,您应该谨慎使用它。如果脏读没问题,则继续。但是,如果读取需要更新数据,那么您需要走不同的路径并找出合并 3M 记录导致问题的确切原因。

我愿意打赌,大部分时间都花在合并命令期间从磁盘读取数据和/或解决内存不足的情况。您最好只是将更多内存填充到数据库服务器中。

理想的数量是有足够的内存来根据需要将整个数据库拉入内存。例如,如果您有 4GB 数据库,那么请确保您在 x64 服务器中拥有 8GB RAM。

Change your front end to use NOLOCK or READ UNCOMMITTED when doing the selects.

You can't NOLOCK MERGE,INSERT, or UPDATE as the records must be locked in order to perform the update. However, you can NOLOCK the SELECTS.

Note that you should use this with caution. If dirty reads are okay, then go ahead. However, if the reads require the updated data then you need to go down a different path and figure out exactly why merging 3M records is causing an issue.

I'd be willing to bet that most of the time is spent reading data from the disk during the merge command and/or working around low memory situations. You might be better off simply stuffing more ram into your database server.

An ideal amount would be to have enough ram to pull the whole database into memory as needed. For example, if you have a 4GB database, then make sure you have 8GB of RAM.. in an x64 server of course.

画中仙 2024-09-17 16:13:07

恐怕我有完全相反的经历。我们执行更新和插入时,源表的行数仅为目标表行数的一小部分(目标表有数百万行)。

当我们在整个操作窗口中组合源表记录,然后仅执行一次 MERGE 时,我们发现性能提高了 500%。我对此的解释是,您只需为 MERGE 命令的预先分析付费一次,而不是在紧密的循环中一遍又一遍地分析。

此外,我确信将 160 万行(源)合并为 700 万行(目标),而不是通过 4000 个不同的操作将 400 行合并为 700 万行(在我们的例子中),可以更好地利用 SQL Server 引擎的功能。同样,相当大量的工作是对两个数据集的分析,而且只完成一次。

我要问的另一个问题是,您是否知道 MERGE 命令在源表和目标表上都有索引时性能会更好?我想向您推荐以下链接:

http ://msdn.microsoft.com/en-us/library/cc879317(v=SQL.100).aspx

I'm afraid that I've quite the opposite experience. We were performing updates and insertions where the source table had only a fraction of the number of rows as the target table, which was in the millions.

When we combined the source table records across the entire operational window and then performed the MERGE just once, we saw a 500% increase in performance. My explanation for this is that you are paying for the up front analysis of the MERGE command just once instead of over and over again in a tight loop.

Furthermore, I am certain that merging 1.6 million rows (source) into 7 million rows (target), as opposed to 400 rows into 7 million rows over 4000 distinct operations (in our case) leverages the capabilities of the SQL server engine much better. Again, a fair amount of the work is in the analysis of the two data sets and this is done only once.

Another question I have to ask is well is whether you are aware that the MERGE command performs much better with indexes on both the source and target tables? I would like to refer you to the following link:

http://msdn.microsoft.com/en-us/library/cc879317(v=SQL.100).aspx

日记撕了你也走了 2024-09-17 16:13:07

根据个人经验,MERGE 的主要问题是,由于它确实进行页锁定,因此它会阻止定向到表的 INSERT 中的任何并发性。因此,如果您沿着这条路走下去,那么您必须对单个写入器中的表中的所有更新进行批处理。

例如:我们有一个表,其中 INSERT 每个条目花费了疯狂的 0.2 秒,大部分时间似乎都浪费在事务锁定上,因此我们将其切换为使用 MERGE,一些快速测试表明它允许我们插入 256 个条目在 0.4 秒甚至 0.5 秒内 512 个,我们使用负载生成器对此进行了测试,一切似乎都很好,直到它投入生产并且所有内容都被页面锁阻塞,导致总吞吐量比单个 INSERT 低得多。

解决方案不仅是在 MERGE 操作中对来自单个生产者的条目进行批处理,而且还在单个 MERGE 操作中通过附加级别的队列(以前每个数据库有一个连接,但是使用 MARS 交错执行实际 MERGE 事务的存储过程的所有生产者调用),这样我们就能够毫无问题地每秒处理数千个 INSERT。

在所有前端读取中使用 NOLOCK 提示是绝对必须的。

From personal experience, the main problem with MERGE is that since it does page lock it precludes any concurrency in your INSERTs directed to a table. So if you go down this road it is fundamental that you batch all updates that will hit a table in a single writer.

For example: we had a table on which INSERT took a crazy 0.2 seconds per entry, most of this time seemingly being wasted on transaction latching, so we switched this over to using MERGE and some quick tests showed that it allowed us to insert 256 entries in 0.4 seconds or even 512 in 0.5 seconds, we tested this with load generators and all seemed to be fine, until it hit production and everything blocked to hell on the page locks, resulting in a much lower total throughput than with the individual INSERTs.

The solution was to not only batch the entries from a single producer in a MERGE operation, but also to batch the batch from producers going to individual DB in a single MERGE operation through an additional level of queue (previously also a single connection per DB, but using MARS to interleave all the producers call to the stored procedure doing the actual MERGE transaction), this way we were then able to handle many thousands of INSERTs per second without problem.

Having the NOLOCK hints on all of your front-end reads is an absolute must, always.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文