聚集索引
SQL Server 中的插入/更新/删除语句应使用哪种类型的索引(聚集/非聚集)。我知道它会产生额外的开销,但是与非聚集索引相比,它的性能更好吗? SQL Server 中的 Select 语句应该使用哪个索引?
Which type of index(clustered/non clustrered) should be used for Insert/Update/Delete statement in SQL Server. I know it creates an additional overhead but is it better in performance as comparison to non clustered index? Also which index should be use for Select statements in SQL Server?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
不能 100% 确定您期望听到的内容 - 您只能在表上有一个聚集索引,并且默认情况下,每个表(极少数边缘情况例外)都应该有一个。所有索引通常对您的 SELECT 帮助最大,而有些索引往往会稍微损害 INSERT、DELETE 和可能的 UPDATE(如果选择不当,则可能会损害很多)。
聚集索引使表的每个操作都更快。是的!确实如此。请参阅 Kim Tripp 的精彩聚集索引争论仍在继续 了解背景信息。她还提到了聚集索引的主要标准:
:不断增加的INT IDENTITY 完美地满足了这一点 - GUID 则不然。请参阅 GUID 作为主要键获取广泛的背景信息。
为什么要缩小?因为聚集键被添加到同一个表上每个非聚集索引的每个索引页中(以便能够在需要时实际查找数据行) )。您不想在集群键中包含 VARCHAR(200)...
为什么是唯一的? 参见上文 - 集群键是 SQL Server 用于唯一查找数据的项和机制排。它必须是独一无二的。如果您选择非唯一的集群键,SQL Server 本身会向您的键添加一个 4 字节的唯一符。小心一点!
下一步:非聚集索引。基本上有一个规则:子表中引用另一个表的任何外键都应该建立索引,它将加速 JOIN 和其他操作。
此外,任何具有 WHERE 子句的查询都是一个很好的候选者 - 首先选择那些执行次数较多的查询。将索引放在 WHERE 子句和 ORDER BY 语句中显示的列上。
接下来:测量您的系统,检查 DMV(动态管理视图)以获取有关未使用或丢失索引的提示,并一遍又一遍地调整您的系统。这是一个持续的过程,你永远不会完成!
另一句警告:使用大量索引,您可以使任何 SELECT 查询运行得非常非常快。但与此同时,必须更新所有涉及的索引的 INSERT、UPDATE 和 DELETE 可能会受到影响。如果你只选择 - 那就发疯吧!否则,这是一种微妙的平衡行为。您总是可以对单个查询进行令人难以置信的调整 - 但系统的其余部分可能会因此而受到影响。不要过度索引您的数据库!放置一些好的指标,检查并观察系统的行为方式,然后可能添加另外一两个指标,然后再次:观察整个系统性能如何受其影响。
Not 100% sure what you're expecting to hear - you can only ever have a single clustering index on a table, and by default, every table (with very few edge case exceptions) should have one. All indices typically help your SELECTs the most and some tend to hurt the INSERTs, DELETEs and possibly UPDATEs a bit (or a lot, if chosen poorly).
A clustered index makes a table faster, for every operation. YES! It does. See Kim Tripp's excellent The Clustered Index Debate continues for background info. She also mentions her main criteria for a clustered index:
INT IDENTITY fulfills this perfectly - GUID's do not. See GUID's as Primary Key for extensive background info.
Why narrow? Because the clustering key is added to each and every index page of each and every non-clustered index on the same table (in order to be able to actually look up the data row, if needed). You don't want to have VARCHAR(200) in your clustering key....
Why unique?? See above - the clustering key is the item and mechanism that SQL Server uses to uniquely find a data row. It has to be unique. If you pick a non-unique clustering key, SQL Server itself will add a 4-byte uniqueifier to your keys. Be careful of that!
Next: non-clustered indices. Basically there's one rule: any foreign key in a child table referencing another table should be indexed, it'll speed up JOINs and other operations.
Furthermore, any queries that have WHERE clauses are a good candidate - pick those first which are executed a lot. Put indices on columns that show up in WHERE clauses, in ORDER BY statements.
Next: measure your system, check the DMV's (dynamic management views) for hints about unused or missing indices, and tweak your system over and over again. It's an ongoing process, you'll never be done!
Another word of warning: with a truckload of indices, you can make any SELECT query go really really fast. But at the same time, INSERTs, UPDATEs and DELETEs which have to update all the indices involved might suffer. If you only ever SELECT - go nuts! Otherwise, it's a fine and delicate balancing act. You can always tweak a single query beyond belief - but the rest of your system might suffer in doing so. Don't over-index your database! Put a few good indices in place, check and observe how the system behaves, and then maybe add another one or two, and again: observe how the total system performance is affected by that.
我不太确定“应该用于插入/更新/删除语句”是什么意思,但在我看来每个表都应该有一个聚集索引。聚集索引指定了数据实际存储的顺序。
如果未定义聚集索引,数据将简单地存储在堆中。
如果您没有自然列作为聚集索引,您始终可以像这样创建一个 int 或 bigint 形式的标识列。
I am not quite sure what you mean by "should be used for Insert/Update/Delete statement" but in my opinion every table should have a clustered index. The clustered index specifies the order in which the data is actually stored.
If a clustered index is not defined the data will simply be stored in a heap.
If you don't have a natural column to serve as you clustered index you could always just create an identity column as an int or bigint like this.