当前位置：文江博客话题详情

在 SQL Server 中将索引声明为唯一

发布于 2024-07-17 08:16:01 字数 129 浏览 5 评论 0原文

如果我知道索引将具有唯一值，那么如果我这样声明它，它将如何影响插入或选择的性能。

如果优化器知道索引是唯一的，这将如何影响查询计划？

我知道指定唯一性可以保持完整性，但暂时将讨论放在一边，性能后果是什么。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

彻夜缠绵 2024-07-24 08:16:02

长话短说：如果您的数据本质上是唯一，那么您将受益于为它们创建UNIQIE索引。

请参阅我的博客中的文章详细说明：

创建索引UNIQUE

现在，血淋淋的细节。

正如@Mehrdad所说，UNIQUENESS会影响计划生成器中的估计行数。

UNIQUE 索引具有最大可能的选择性，这就是为什么：

SELECT  *
FROM    table1 t2, table2 t2
WHERE   t1.id = :myid
        AND t2.unique_indexed_field = t1.value

几乎肯定会使用 NESTED LOOPS，而

SELECT  *
FROM    table1 t2, table2 t2
WHERE   t1.id = :myid
        AND t2.non_unique_indexed_field = t1.value

如果优化器认为可以从 HASH JOIN 中受益non_unique_indexed_field 不是选择性的。

如果您的索引是CLUSTERED（即行本身包含在索引叶中）并且是非UNIQUE，则称为uniquifier 的特殊隐藏列添加到每个索引键中，从而使键更大并且索引更慢。

这就是为什么 UNIQUE CLUSTERED 索引实际上比非 UNIQUE CLUSTERED 索引更有效。

在 Oracle 中，需要对 UNIQUE INDEX 进行联接，以实现所谓的键保存，这确保将在以下位置选择表中的每一行：最一次并使视图可更新。

此查询：

UPDATE  (
        SELECT  *
        FROM    mytable t1, mytable t2
        WHERE   t2.reference = t1.unique_indexed_field
        )
SET     value = other_value

将在 Oracle 中运行，而此查询：

UPDATE  (
        SELECT  *
        FROM    mytable t1, mytable t2
        WHERE   t2.reference = t1.non_unique_indexed_field
        )
SET     value = other_value

将失败。

不过，这不是 SQL Server 的问题。

还有一件事：对于像这样的表，

CREATE TABLE t_indexer (id INT NOT NULL PRIMARY KEY, uval INT NOT NULL, ival INT NOT NULL)
CREATE UNIQUE INDEX ux_indexer_ux ON t_indexer (uval)
CREATE INDEX ix_indexer_ux ON t_indexer (ival)

此查询：

/* Sorts on the non-unique index first */
SELECT  TOP 1 *
FROM    t_indexer
ORDER BY
        ival, uval

将使用TOP N SORT，而这个：

/* Sorts on the unique index first */
SELECT  TOP 1 *
FROM    t_indexer
ORDER BY
        uval, ival

将仅使用索引扫描。

对于后一个查询，对 ival 进行额外排序是没有意义的，因为 uval 无论如何都是唯一的，并且优化器会考虑到这一点。

在 200,000 行的示例数据 (id == uval == ival) 上，前一个查询运行 15 秒，而后一个查询运行时间为 15 秒立即的。

Long story short: if your data are intrinsically UNIQUE, you will benefit from creating a UNIQIE index on them.

See the article in my blog for detailed explanation:

Making an index UNIQUE

Now, the gory details.

As @Mehrdad said, UNIQUENESS affects the estimated row count in the plan builder.

UNIQUE index has maximal possible selectivity, that's why:

SELECT  *
FROM    table1 t2, table2 t2
WHERE   t1.id = :myid
        AND t2.unique_indexed_field = t1.value

almost surely will use NESTED LOOPS, while

SELECT  *
FROM    table1 t2, table2 t2
WHERE   t1.id = :myid
        AND t2.non_unique_indexed_field = t1.value

may benefit from a HASH JOIN if the optimizer thinks that non_unique_indexed_field is not selective.

If your index is CLUSTERED (i. e. the rows theirselves are contained in the index leaves) and non-UNIQUE, then a special hidden column called uniquifier is added to each index key, thus making the key larger and the index slower.

That's why UNIQUE CLUSTERED index is in fact a little more efficicent than a non-UNIQUE CLUSTERED one.

In Oracle, a join on UNIQUE INDEX is required for a such called key preservation, which ensures that each row from a table will be selected at most once and makes a view updatable.

This query:

UPDATE  (
        SELECT  *
        FROM    mytable t1, mytable t2
        WHERE   t2.reference = t1.unique_indexed_field
        )
SET     value = other_value

will work in Oracle, while this one:

UPDATE  (
        SELECT  *
        FROM    mytable t1, mytable t2
        WHERE   t2.reference = t1.non_unique_indexed_field
        )
SET     value = other_value

will fail.

This is not an issue with SQL Server, though.

One more thing: for a table like this,

CREATE TABLE t_indexer (id INT NOT NULL PRIMARY KEY, uval INT NOT NULL, ival INT NOT NULL)
CREATE UNIQUE INDEX ux_indexer_ux ON t_indexer (uval)
CREATE INDEX ix_indexer_ux ON t_indexer (ival)

, this query:

/* Sorts on the non-unique index first */
SELECT  TOP 1 *
FROM    t_indexer
ORDER BY
        ival, uval

will use a TOP N SORT, while this one:

/* Sorts on the unique index first */
SELECT  TOP 1 *
FROM    t_indexer
ORDER BY
        uval, ival

will use just an index scan.

For the latter query, there is no point in additional sorting on ival, since uval are unique anyway, and the optimizer takes this into account.

On sample data of 200,000 rows (id == uval == ival), the former query runs for 15 seconds, while the latter one is instant.

回复收藏 0 原文

ˇ宁静的妩媚 2024-07-24 08:16:02

当然，优化器会考虑唯一性。它会影响查询计划中的预期行数。

回复收藏 0 原文

离不开的别离 2024-07-24 08:16:02

插入数据时性能会受到负面影响。它需要检查唯一性。

回复收藏 0 原文

离旧人 2024-07-24 08:16:02

我刚刚在我的机器上针对包含超过 100 万行的生产表进行了测试，因为我认为这是一个很好的测试。结果很有趣，这是原始数字：

-- 无索引：

    Setup Time: 8888, Insert Time: 501690

-- 唯一约束：

    Setup Time:   42, Insert Time: 488030

设置包括获取我添加唯一约束的字段的最大值 - 因此从逻辑上讲，通过添加约束，性能显着提高。这也将提高通过该外键搜索时的性能。

有趣的是，插入时间也略有改善（提高了 2.7228%），因此[在我的测试用例中]添加约束（+固有索引）只会产生积极影响。

测试显示添加约束仅产生积极影响 - 没有性能影响。

注意：对于我们的测试系统，我希望这些值几乎总是唯一的，因此我没有测试插入非唯一值，在这个数据中它确实是一个例外 - 而不是我们需要高性能的东西。

I've just tested this on my machine for a Production table containing more than 1 million rows because I figured it was a good test. The results were interesting, here's the raw numbers:

-- No Index:

    Setup Time: 8888, Insert Time: 501690

-- Unique Constraint:

    Setup Time:   42, Insert Time: 488030

The Setup consisted of getting the Maximum of the field that I was adding the Unique Constraint to - so logically the performance was dramatically increased by adding the constraint. This would also improve performance when searching by this foreign key.

Interestingly the Insert Time improved slightly as well (by 2.7228%), so only positive impacts [in my test case] of adding the Constraint (+ inherent index).

Testing shows only positive impacts from adding the constraint - no performance impact.

NOTE: For our test system I expect the values to almost always be unique, so I didn't test inserting non-unique values, in this data it truly is an exception - and not something we need to be performant.

回复收藏 0 原文