EXISTS 在 INSERT 之前还是之后哪个更快?
我在 SQL Server 中有一个 SP,它每分钟运行数百次,需要根据数据库检查传入流量。目前它执行以下操作
INSERT INTO table
SELECT @value1,@value2 WHERE NOT EXISTS
(SELECT * FROM table WHERE value1 = @value1 AND value2 = @value2);
但是,我也可以选择
IF NOT EXISTS(SELECT * FROM table WHERE value1 = @value1 AND value2 = @value2)
INSERT INTO table (value1,value2) VALUES (@value1,@value2);
哪个更快?我感觉它们之间没有太大区别,但我历来不太擅长 TSQL...=/
更新:哎呀...意思是说 EXISTS 使用超过 1 个值来查找记录是否存在,所以唯一的约束不起作用。编辑示例以反映...
I have an SP in SQL Server which runs hundreds of times a minute, and needs to check incoming traffic against a database. At the moment it does the following
INSERT INTO table
SELECT @value1,@value2 WHERE NOT EXISTS
(SELECT * FROM table WHERE value1 = @value1 AND value2 = @value2);
However, I could also go with
IF NOT EXISTS(SELECT * FROM table WHERE value1 = @value1 AND value2 = @value2)
INSERT INTO table (value1,value2) VALUES (@value1,@value2);
Which would be faster? I get the feeling there's not much difference between them but I'm historically not very good at TSQL... =/
UPDATE: Whoops... meant to state that the EXISTS uses more than 1 value to find if a record exists, so a unique constraint won't work. Edited the sample to reflect that...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
两种变体都是不正确的。您将插入成对的重复@value1、@value2、保证。
处理此问题的正确方法是对两列强制执行唯一约束,并始终插入并处理违反约束的情况:
并插入:
虽然这些看起来很复杂,但必须考虑一点细节命名为正确性。与基于锁定提示的解决方案相比,这要简单得多。这也是性能最高的解决方案:仅进行一次查找。所有其他解决方案至少需要两次查找(一次用于验证是否可以插入,一次用于插入)。
Both variants are incorrect. You will insert pairs of duplicate @value1, @value2, guaranteed.
The correct way to handle this is to enforce a unique constraint on two columns and to always INSERT and handle the constraint violation:
and to insert:
While these may seem complicated, one has to factor in a little detail named correctness. This is by far simpler when compared with a lock hints based solution. This is also the most performant solution: does only one seek. All other solutions need at least two seeks (one to validate that it can be inserted, one to insert).
在几乎并发的环境中,并发
INSERT
可能发生在第二个查询的IF NOT EXISTS
和INSERT
之间。您的第一个查询将在它检查的记录上放置共享锁,直到查询结束才会解除共享锁,因此在查询运行之前无法插入新记录。
但是,您不应仅仅依赖此行为。在
值
上放置一个额外的UNIQUE
约束。它不仅会使数据库更加一致,而且会创建一个索引,使第一个查询更快。
In a hardly concurrent environment, a concurrent
INSERT
can happen in betweenIF NOT EXISTS
andINSERT
in your second query.Your first query will place the shared locks on the record it examines, which will not be lifted until the end of the query, so it will be impossible to insert a new record until the query is running.
However, you should not rely solely on this behavior. Place an additional
UNIQUE
constraint on thevalue
.It will not only make the database more consistent, but will create an index which will make the first query more fast.
在对这个问题及其答案添加了无数评论之后,我将继续回答它。
我预计原始问题中提出的两个建议之间不会有任何重大的性能差异。一方面,正如 Ray 所指出的,第二种方法可能会让您免于为插入做一些准备工作,但另一方面,RDBMS 通常在使用批处理语句时性能最佳,如第一种解决方案。
KM 和 DVK 建议添加一个
UNIQUE
约束,这将使唯一性测试隐式进行,但需要您在INSERT
语句周围添加某种错误处理。我很难理解为什么这会增加任何额外的性能,假设您已经有一个覆盖两列的索引。 如果您没有这样的索引,请添加它,并重新考虑您对更高性能的需求。据我所知,无论是显式还是隐式执行唯一性检查都不重要。如果通过在 DBMS“内部”完成检查获得任何好处,那么该收益可能会被与存在重复项时引发和处理错误相关的开销所吞噬。
底线:假设索引已经到位,如果您仍然发现自己渴望性能,我的建议是您对三个建议的解决方案进行实证测试。编写一个小程序来模拟预期的输入数据,并用数十亿行(包括合理数量的重复项)消除这三个解决方案中的每一个。执行此操作,请务必发布您的结果:-)
After adding a gazillion comments on this question and its answers, I will have my own go on answering it.
I would not expect any major difference in performance between the two proposed proposed in the original question. On one hand, as pointed out by Ray, the second approach might save you from doing some preparations for the insert, but on the other hand, an RDBMS usually performs best with batch statements, as in the first solution.
KM and DVK suggest adding a
UNIQUE
constraint, which will make the uniqueness test implicit, but will require you to add some kind of error handling around yourINSERT
statement. I have a hard time spotting why this should add any additional performance, assuming that you already have an index covering the two columns. If you do not have such index, add it, and reconsider your need for more performance.Whether the uniqueness check is performed explicit or implicit should not matter AFAIK. If anything is gained by having the check done "inside" the stomach of the DBMS, that gain might just be eaten up by overhead associated with raising and handling errors when duplicates exists.
The bottom line: Assuming an index is already in place, if you still find yourself lusting for performance, my recommendation is that you perform empirical tests on the three suggested solutions. Cook up a small program that simulates the expected input data, and blow each of the three solutions away with a few billion rows, including a plausible amount of duplicates. do this, be sure to post your results :-)
如果您希望值是唯一的,为什么不直接在值上创建唯一约束,执行不带 SELECT 的 INSERT 并优雅地处理违反约束的错误?
这比这两种方法中的任何一种都要快。
另外,您的第一种方法不起作用 - 当您选择时,您已经插入了值,因此选择显然会找到您刚刚插入的内容。
If you want the values to be unique, why not just create a unique constraint on the value, do an INSERT without SELECT and gracefully handle constraint violation error?
That'd be faster than either of these approaches.
Also, your first approach doesn't work - by the time you get to select, you already inserted the value so select will obviously find what you just inserted.
只需执行此操作,并忽略任何错误(假设对值有唯一约束)...
由于每分钟运行数百次,因此应将锁定提示添加到 SELECT 并将事务添加到 避免竞争条件
但是,我的提出的仅插入并忽略任何重复约束错误的想法也可以避免竞争条件。
just do it, and ignore any error (assumes an unique constraint on Value)...
Since this runs hundreds of times a minute, locking hints should be added to the SELECTs and a transaction to avoid a race condition
however, my proposed idea of just INSERT and ignore any duplicate constraint error would avoid a race condition as well.
如果我不得不猜测,我猜第二种选择会更快。如果存在失败,sql server 不必为插入进行任何类型的设置,而在第一个中,它可能会查找一些表和字段名称并为永远不会发生的插入做好准备。但是,我会在查询分析器中尝试一下,看看计划说了什么。
If I had to guess, I would guess the second option would be faster. sql server would not have to do any kind of setup for the insert if the exists fails, whereas in the first one, it might look up some table and field names and prepare for an insert that never happens. However, I would try it in the query analyzer and see what the plan says.