聚集索引和非聚集索引的区别
我需要向我的表添加正确的 index
并需要一些帮助。
我很困惑,需要澄清几点:
我应该对
非 int
列使用索引吗?为什么/为什么不我已经阅读了很多有关
聚集
和非聚集
索引的内容,但我仍然无法决定何时使用其中之一。一个好的例子会对我和许多其他开发人员有所帮助。
我知道我不应该对经常更新的列或表使用索引。我还应该注意什么?在进入测试阶段之前我如何知道一切都很好?
I need to add proper index
to my tables and need some help.
I'm confused and need to clarify a few points:
Should I use index for
non-int
columns? Why/why notI've read a lot about
clustered
andnon-clustered
index yet I still can't decide when to use one over the other. A good example would help me and a lot of other developers.
I know that I shouldn't use indexes for columns or tables that are often updated. What else should I be careful about and how can I know that it is all good before going to test phase?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
聚集索引改变行的存储方式。当您在一列(或多列)上创建聚集索引时,SQL Server 会按该列对表的行进行排序。它就像一本字典,整本书中的所有单词都按字母顺序排序。
另一方面,非聚集索引不会改变行在表中的存储方式。它在表中创建一个完全不同的对象,其中包含选择用于索引的列以及返回到包含数据的表行的指针。它就像一本书最后几页的索引,其中对关键字进行排序并包含书籍材料的页码,以便更快地参考。
A clustered index alters the way that the rows are stored. When you create a clustered index on a column (or a number of columns), SQL server sorts the table’s rows by that column(s). It is like a dictionary, where all words are sorted in alphabetical order in the entire book.
A non-clustered index, on the other hand, does not alter the way the rows are stored in the table. It creates a completely different object within the table that contains the column(s) selected for indexing and a pointer back to the table’s rows containing the data. It is like an index in the last pages of a book, where keywords are sorted and contain the page number to the material of the book for faster reference.
您确实需要区分两个问题:
1)主键是一个逻辑构造 - 唯一且可靠地标识表中每一行的候选键之一。这实际上可以是任何东西——一个 INT、一个 GUID、一个字符串——选择对你的场景最有意义的。
2)聚集键(定义表上“聚集索引”的一列或多列) - 这是一个物理与存储相关的东西,这里是一个小的、稳定、不断增加的数据类型是您的最佳选择 - INT 或 BIGINT 作为您的默认选项。
默认情况下,SQL Server 表上的主键也用作聚簇键 - 但不必如此!
我要应用的一条经验法则是:任何“常规”表(用于存储数据的表,即查找表等)都应该有一个集群键。没有聚类键确实没有意义。实际上,与普遍认为相反,拥有集群键实际上可以加速所有常见操作 - 甚至插入和删除(因为表组织不同,通常比使用堆更好 - 没有集群的表钥匙)。
金伯利·特里普,女王Indexing 有很多优秀的文章,主题是为什么要使用集群键,以及最好使用哪种列作为集群键。由于每个表只能获得一个,因此选择正确集群键(而不仅仅是任何集群键)至关重要。
Marc
You really need to keep two issues apart:
1) the primary key is a logical construct - one of the candidate keys that uniquely and reliably identifies every row in your table. This can be anything, really - an INT, a GUID, a string - pick what makes most sense for your scenario.
2) the clustering key (the column or columns that define the "clustered index" on the table) - this is a physical storage-related thing, and here, a small, stable, ever-increasing data type is your best pick - INT or BIGINT as your default option.
By default, the primary key on a SQL Server table is also used as the clustering key - but that doesn't need to be that way!
One rule of thumb I would apply is this: any "regular" table (one that you use to store data in, that is a lookup table etc.) should have a clustering key. There's really no point not to have a clustering key. Actually, contrary to common believe, having a clustering key actually speeds up all the common operations - even inserts and deletes (since the table organization is different and usually better than with a heap - a table without a clustering key).
Kimberly Tripp, the Queen of Indexing has a great many excellent articles on the topic of why to have a clustering key, and what kind of columns to best use as your clustering key. Since you only get one per table, it's of utmost importance to pick the right clustering key - and not just any clustering key.
Marc
您应该使用索引来提高 SQL Server 性能。通常这意味着用于在表中查找行的列已建立索引。
聚集索引使 SQL Server 根据索引顺序对磁盘上的行进行排序。这意味着如果您按照聚集索引的顺序访问数据,那么数据将以正确的顺序出现在磁盘上。但是,如果具有聚集索引的列经常更改,则行将在磁盘上移动,从而导致开销 - 这通常不是一个好主意。
索引太多也不好。它们的维护成本很高。因此,从明显的开始,然后进行分析,看看您错过了哪些并会从中受益。您从一开始就不需要它们,可以稍后添加它们。
建立索引时可以使用大多数列数据类型,但对小列进行索引比对大列进行索引要好。在列组(例如国家+城市+街道)上创建索引也很常见。
此外,除非表中有相当多的数据,否则您不会注意到性能问题。另一件需要考虑的事情是 SQL Server 需要统计信息来以正确的方式进行查询优化,因此请确保您确实生成了统计信息。
You should be using indexes to help SQL server performance. Usually that implies that columns that are used to find rows in a table are indexed.
Clustered indexes makes SQL server order the rows on disk according to the index order. This implies that if you access data in the order of a clustered index, then the data will be present on disk in the correct order. However if the column(s) that have a clustered index is frequently changed, then the row(s) will move around on disk, causing overhead - which generally is not a good idea.
Having many indexes is not good either. They cost to maintain. So start out with the obvious ones, and then profile to see which ones you miss and would benefit from. You do not need them from start, they can be added later on.
Most column datatypes can be used when indexing, but it is better to have small columns indexed than large. Also it is common to create indexes on groups of columns (e.g. country + city + street).
Also you will not notice performance issues until you have quite a bit of data in your tables. And another thing to think about is that SQL server needs statistics to do its query optimizations the right way, so make sure that you do generate that.
非聚集索引与聚集索引的比较示例
作为非聚集索引的示例,假设我们在 EmployeeID 列上有一个非聚集索引。非聚集索引将存储
AND 指向 Employee 表中实际存储该值的行的指针。但另一方面,聚集索引实际上会存储特定 EmployeeID 的行数据 - 因此,如果您正在运行查找 EmployeeID 为 15 的查询,则表中其他列的数据如下
。实际上将全部存储在聚集索引本身的叶节点中。
这意味着使用非聚集索引需要额外的工作来跟踪指向表中行的指针以检索任何其他所需的值,而不是聚集索引只能直接访问行,因为它存储在与聚集索引本身的顺序相同。因此,从聚集索引读取通常比从非聚集索引读取要快。
A comparison of a non-clustered index with a clustered index with an example
As an example of a non-clustered index, let’s say that we have a non-clustered index on the EmployeeID column. A non-clustered index will store both the value of the
AND a pointer to the row in the Employee table where that value is actually stored. But a clustered index, on the other hand, will actually store the row data for a particular EmployeeID – so if you are running a query that looks for an EmployeeID of 15, the data from other columns in the table like
. will all actually be stored in the leaf node of the clustered index itself.
This means that with a non-clustered index extra work is required to follow that pointer to the row in the table to retrieve any other desired values, as opposed to a clustered index which can just access the row directly since it is being stored in the same order as the clustered index itself. So, reading from a clustered index is generally faster than reading from a non-clustered index.
一般来说,在将(大量)使用的列上使用索引来搜索表,例如主键(默认情况下具有聚集索引)。例如,如果您有查询(以伪代码形式),
您可能希望在 FOO.BAR 上放置索引。聚集索引应该用在将用于排序的列上。聚集索引用于对磁盘上的行进行排序,因此每个表只能有一个。例如,如果您有查询,
您可能需要考虑 FOO.BAR 上的聚集索引。
最重要的考虑因素可能是您的查询花费了多少时间。如果查询不需要太多时间或不经常使用,则可能不值得添加索引。一如既往,先分析,然后优化。 SQL Server Studio 可以为您提供有关优化位置的建议,并且 MSDN 有一些信息1< /a> 你可能会觉得有用
In general, use an index on a column that's going to be used (a lot) to search the table, such as a primary key (which by default has a clustered index). For example, if you have the query (in pseudocode)
You might want to put an index on FOO.BAR. A clustered index should be used on a column that will be used for sorting. A clustered index is used to sort the rows on disk, so you can only have one per table. For example if you have the query
You might want to consider a clustered index on FOO.BAR.
Probably the most important consideration is how much time your queries are taking. If a query doesn't take much time or isn't used very often, it may not be worth adding indexes. As always, profile first, then optimize. SQL Server Studio can give you suggestions on where to optimize, and MSDN has some information1 that you might find useful
由于数据按索引顺序物理存储,因此读取速度比非集群更快
我们只能为每个表创建一个(聚簇索引),
插入和更新操作比聚簇索引更快。
我们可以创建n个非聚簇索引。
faster to read than non cluster as data is physically storted in index order
we can create only one per table.(cluster index)
quicker for insert and update operation than a cluster index.
we can create n number of non cluster index.