在 SQL Server 中将索引声明为唯一
如果我知道索引将具有唯一值,那么如果我这样声明它,它将如何影响插入或选择的性能。
如果优化器知道索引是唯一的,这将如何影响查询计划?
我知道指定唯一性可以保持完整性,但暂时将讨论放在一边,性能后果是什么。
If I know an index will have unique values, how will it affect performance on inserts or selects if I declare it as such.
If the optimiser knows the index is unique how will that affect the query plan?
I understand that specifying uniquenes can serve to preserve integrity, but leaving that discussion aside for the moment, what are the perfomance consequences.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
长话短说:如果您的数据本质上是
唯一
,那么您将受益于为它们创建UNIQIE
索引。请参阅我的博客中的文章详细说明:
UNIQUE
现在,血淋淋的细节。
正如@Mehrdad所说,
UNIQUENESS
会影响计划生成器中的估计行数。UNIQUE
索引具有最大可能的选择性,这就是为什么:几乎肯定会使用
NESTED LOOPS
,而如果优化器认为可以从
HASH JOIN
中受益non_unique_indexed_field
不是选择性的。如果您的索引是
CLUSTERED
(即行本身包含在索引叶中)并且是非UNIQUE
,则称为uniquifier
的特殊隐藏列添加到每个索引键中,从而使键更大并且索引更慢。这就是为什么
UNIQUE CLUSTERED
索引实际上比非 UNIQUE CLUSTERED
索引更有效。在
Oracle
中,需要对UNIQUE INDEX
进行联接,以实现所谓的键保存
,这确保将在以下位置选择表中的每一行:最一次并使视图可更新。此查询:
将在
Oracle
中运行,而此查询:将失败。
不过,这不是
SQL Server
的问题。还有一件事:对于像这样的表,
此查询:
将使用
TOP N SORT
,而这个:将仅使用索引扫描。
对于后一个查询,对
ival
进行额外排序是没有意义的,因为uval
无论如何都是唯一的,并且优化器会考虑到这一点。在
200,000
行的示例数据 (id == uval == ival
) 上,前一个查询运行15
秒,而后一个查询运行时间为15
秒立即的。Long story short: if your data are intrinsically
UNIQUE
, you will benefit from creating aUNIQIE
index on them.See the article in my blog for detailed explanation:
UNIQUE
Now, the gory details.
As @Mehrdad said,
UNIQUENESS
affects the estimated row count in the plan builder.UNIQUE
index has maximal possible selectivity, that's why:almost surely will use
NESTED LOOPS
, whilemay benefit from a
HASH JOIN
if the optimizer thinks thatnon_unique_indexed_field
is not selective.If your index is
CLUSTERED
(i. e. the rows theirselves are contained in the index leaves) and non-UNIQUE
, then a special hidden column calleduniquifier
is added to each index key, thus making the key larger and the index slower.That's why
UNIQUE CLUSTERED
index is in fact a little more efficicent than anon-UNIQUE CLUSTERED
one.In
Oracle
, a join onUNIQUE INDEX
is required for a such calledkey preservation
, which ensures that each row from a table will be selected at most once and makes a view updatable.This query:
will work in
Oracle
, while this one:will fail.
This is not an issue with
SQL Server
, though.One more thing: for a table like this,
, this query:
will use a
TOP N SORT
, while this one:will use just an index scan.
For the latter query, there is no point in additional sorting on
ival
, sinceuval
are unique anyway, and the optimizer takes this into account.On sample data of
200,000
rows (id == uval == ival
), the former query runs for15
seconds, while the latter one is instant.当然,优化器会考虑唯一性。 它会影响查询计划中的预期行数。
Of course the optimizer will take uniqueness in consideration. It affects the expected row count in query plans.
插入数据时性能会受到负面影响。 它需要检查唯一性。
Performance is negatively affected when inserting data. It needs to check the uniqueness.
我刚刚在我的机器上针对包含超过 100 万行的生产表进行了测试,因为我认为这是一个很好的测试。 结果很有趣,这是原始数字:
-- 无索引:
-- 唯一约束:
设置包括获取我添加唯一约束的字段的最大值 - 因此从逻辑上讲,通过添加约束,性能显着提高。 这也将提高通过该外键搜索时的性能。
有趣的是,插入时间也略有改善(提高了 2.7228%),因此[在我的测试用例中]添加约束(+固有索引)只会产生积极影响。
测试显示添加约束仅产生积极影响 - 没有性能影响。
注意:对于我们的测试系统,我希望这些值几乎总是唯一的,因此我没有测试插入非唯一值,在这个数据中它确实是一个例外 - 而不是我们需要高性能的东西。
I've just tested this on my machine for a Production table containing more than 1 million rows because I figured it was a good test. The results were interesting, here's the raw numbers:
-- No Index:
-- Unique Constraint:
The Setup consisted of getting the Maximum of the field that I was adding the Unique Constraint to - so logically the performance was dramatically increased by adding the constraint. This would also improve performance when searching by this foreign key.
Interestingly the Insert Time improved slightly as well (by 2.7228%), so only positive impacts [in my test case] of adding the Constraint (+ inherent index).
Testing shows only positive impacts from adding the constraint - no performance impact.
NOTE: For our test system I expect the values to almost always be unique, so I didn't test inserting non-unique values, in this data it truly is an exception - and not something we need to be performant.
是的,查询引擎会考虑它。
Yes, it will be taken into consideration by the query engine.
也许更重要的是:唯一性将保护数据的完整性。 性能是忽略这一点的一个理由。
性能可能会受到积极或消极的影响,或者根本不会受到影响:这取决于查询、是否使用索引等
Perhaps more important: the uniqueness will protect the data integrity. Performance would a reason to ignore this.
Performance could be affected positively or negatively or not at all: it would depends on the query, if the index is used etc