SQL 2005:键、索引和约束问题
我有一系列关于 SQL(尤其是 SQL 2005)中的键、索引和约束的问题。我已经使用 SQL 大约 4 年了,但我从未能够获得关于这个主题的明确答案,并且博客文章等中总是存在矛盾的信息。我创建和使用的大多数时间表只有一个 Identity 列是主键,其他表通过外键指向它。
对于连接表,我没有身份并在外键列上创建复合主键。以下是我目前信念的一组陈述,可能有错误,如有错误请指正,以及其他问题。
所以这里是:
据我了解,聚集索引和非聚集索引(无论它是否唯一)之间的区别在于,聚集索引影响表中数据的物理顺序(因此,一个表中只能有一个)表),而非聚集索引构建树形数据结构。创建索引时为什么要关心聚集索引和非聚集索引?我什么时候应该使用其中之一?有人告诉我,非聚集索引的插入和删除速度很慢,因为需要“重建”树。我认为聚集索引不会以这种方式影响性能吗?
我看到主键实际上只是唯一的聚集索引(它们必须聚集吗?)。主键与聚集唯一索引有什么特别之处?
我也见过约束,但我从未使用过它们或真正看过它们。有人告诉我,约束的目的是为了增强数据完整性,而索引的目的是为了性能。我还读到约束实际上是作为索引实现的,所以它们是“相同的”。这对我来说听起来不对。约束与索引有何不同?
I have a series of questions about Keys, Indexes and Constraints in SQL, SQL 2005 in particular. I have been working with SQL for about 4 years but I have never been able to get definitive answers on this topic and there is always contradictory info on blog posts, etc. Most of the time tables I create and use just have an Identity column that is a Primary Key and other tables point to it via a Foreign Key.
With join tables I have no Identity and create a composite Primary Key over the Foreign Key columns. The following is a set of statements of my current beliefs, which may be wrong, please correct me if so, and other questions.
So here goes:
As I understand it the difference between a Clustered and Non Clustered Index (regardless of whether it is Unique or not) is that the Clustered Index affects the physical ordering of data in a table (hence you can only have one in a table), whereas a Non Clustered Index builds a tree data structure. When creating Indexes why should I care about Clustered vs Non Clustered? When should I use one or the other? I was told that inserting and deleting are slow with Non-Clustered indexes as the tree needs to be "rebuilt." I take it Clustered indexes do not affect performance this way?
I see that Primary Keys are actually just Clustered Indexes that are Unique (do they have to be clustered?). What is special about a Primary Key vs a Clustered Unique Index?
I have also seen Constraints, but I have never used them or really looked at them. I was told that the purpose of Constraints is that they are for enforcing data integrity, whereas Indexes are aimed at performance. I have also read that constraints are acually implemented as Indexes anyway so they are "the same." This doesnt sound right to me. How are constraints different to Indexes?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
聚集索引,正如您所说的那样,定义为表中的数据如何物理存储,即您有一个使用聚类键排序的 B 树,并且您在叶级别拥有数据。
另一方面,非聚集索引是单独的树结构,在叶级别仅具有聚集键(如果表是堆,则为 RID),这意味着当您使用非聚集索引时,您必须使用聚集索引来获取其他索引列(除非您的请求完全被非聚集索引覆盖,如果您仅请求构成非聚集索引键列的列,则可能会发生这种情况)。
什么时候应该使用其中之一?好吧,既然你只能有一个聚集索引,那么就在最有意义的列上定义它,即当你大多数时候通过 ID 查找客户端时,在 ID 上定义一个聚集索引。非聚集索引应该定义在不经常使用的列上。
关于性能,更改索引键的插入或更新总是很痛苦,无论它是否是非聚集索引上的聚集,因为可能会发生页面拆分,这会迫使数据在页面之间移动(移动聚集索引的页面)伤害更大,因为叶级别有更多数据)。因此,一般规则是避免更改索引键和插入新值,以便它们是连续的。否则,您将遇到碎片,并且必须定期重建索引。
最后,关于约束,根据定义,它们与索引无关,但SQL Server选择使用索引来实现它们。例如,目前,唯一约束被实现为索引,但是这可能在未来版本中发生变化(尽管我怀疑这会发生)。索引的类型(聚集或非聚集)由您决定,只要记住您只能有一个聚集索引。
如果您有更多此类问题,我强烈建议您阅读这本书,其中深入涵盖了这些主题。
Clustered indexes are, as you put it correctly, the definition as to how data in a table is stored physically, i.e. you have a B-tree sorted using the clustering key and you have the data at the leaf level.
Non-clustered indexes on the other hand are separate tree structures which at the leaf level only have the clustering key (or a RID if the table is a heap), meaning that when you use a non-clustered index, you'll have to use the clustered index to get the other columns (unless your request is fully covered by the non-clustered index, which can happen if you request only the columns, which constitute the non-clustered index key columns).
When should you use one or the other ? Well, since you can have only one clustered index, define it on the columns which makes most sense, i.e. when you look up clients by ID most of the time, define a clustered index on the ID. Non-clustered indexes should be defined on columns which are used less often.
Regarding performance, inserts or updates that change the index key are always painfull, regardless of whether it is a clusted on non-clustered index, since page splits can happen, which forces data to be moved between pages (moving the pages of a clustered index hurts more, since you have more data in the leaf level). Thus the general rule is to avoid changing the index key and inserting new values so that they would be sequencial. Otherwise you'll encounter fragmentation and will have to rebuild your index on a regular basis.
Finally, regarding constraints, by definition, they have nothing to do with indexes, yet SQL server has chosen to implement them using indexes. E.g. currently, a unique constraint is implemented as an index, however this can change in a future version (though I doubt that will happen). The type of index (clustered or not) is up to you, just remember that you can have only one clustered index.
If you have more questions of this type, I highly recommend reading this book, which covers these topics in depth.
您对集群与非集群的假设非常好
似乎主键强制执行非空唯一性,而唯一索引不强制执行非空主要与唯一
Your assumption about the clustered vs non-clustered is pretty good
It also seems that primary key enforces non null uniquenes, while the unique index does not enforce non null primary vs unique
主键是关系数据库理论中的一个逻辑概念 - 它是一个键(通常也是一个索引),旨在唯一标识您的任何行。因此它必须是唯一的并且不能为 NULL。
聚集键是SQL Server特有的一个存储物理概念。它是一个特殊的索引,不仅用于查找等,还定义表中数据的物理结构。在西欧文化的印刷电话簿中(冰岛可能除外),聚集索引将是“姓氏,名字”。
由于聚类索引定义了您的物理数据布局,因此您只能拥有其中之一(或没有 - 但不推荐)。
聚簇键的要求是:
SQL Server 默认将主键设为集群键 - 但如果需要,您可以更改它。另外,请注意:组成聚集键的列将添加到表中每个非聚集索引的每个条目中 - 因此您希望使聚集键尽可能小。这是因为聚集键将用于执行“书签查找” - 如果您在非聚集索引中找到了一个条目(例如,通过其社会安全号码找到一个人),现在您需要抓取整行数据来查找获取更多详细信息,您需要进行查找,为此,使用聚类键。
关于什么才是好的或有用的集群和/或主键存在着激烈的争论 - 这里有一些优秀的博客文章可供阅读:
Marc
The primary key is a logical concept in relational database theory - it's a key (and typically also an index) which is designed to uniquely identify any of your rows. Therefore it must be unique and it cannot be NULL.
The clustering key is a storage-physical concept of SQL Server specifically. It's a special index that isn't just used for lookups etc., but also defines the physical structure of your data in your table. In a printed phonebook in Western European culture (except maybe for Iceland ), the clustered index would be "LastName, FirstName".
Since the clustering index defines your physical data layout, you can only ever have one of those (or none - not recommended, though).
Requirements for a clustering key are:
SQL Server makes your primary key the clustering key by default - but you can change that if you need to. Also, mind you: the columns that make up the clustering key will be added to each and every entry of each and every non-clustered index on your table - so you want to keep your clustering key as small as possible. This is because the clustering key will be used to do the "bookmark lookup" - if you found an entry in a non-clustered index (e.g. a person by their social security number) and now you need to grab the entire row of data to get more details, you need to do a lookup, and for this, the clustering key is used.
There's a great debate about what makes a good or useful clustering and/or primary key - here's a few excellent blog posts to read about this:
Marc
你有几个问题。我将分解其中的一些:
创建索引时为什么我应该关心聚集与非聚集?
有时您确实关心行的组织方式。这取决于您的数据以及您将如何使用它。例如,如果您的主键是
uniqueidentifier
,您可能不希望它是CLUSTERED
,因为 GUID 值本质上是随机的。这将导致 SQL 在整个表中随机插入行,从而导致页拆分,从而损害性能。如果您的主键值始终按顺序递增(例如int IDENTITY
),那么您可能希望它是CLUSTERED
,因此您的表将始终在末尾增长。默认情况下,主键是
CLUSTERED
,大多数时候您不必担心它。有人告诉我,非聚集索引的插入和删除速度很慢,因为需要“重建”树。我认为聚集索引不会以这种方式影响性能?
实际上,事实可能恰恰相反。
NONCLUSTERED
索引保留为单独的数据结构,但该结构的设计允许进行一些修改,而无需“重新构建”。最初创建索引时,您可以指定FILLFACTOR
,它指定索引的每个页面上保留多少可用空间。这允许索引在需要进行页面分割之前容忍一些修改。即使必须发生页面拆分,它也只会影响相邻页面,而不影响整个索引。同样的行为也适用于 CLUSTERED 索引,但由于 CLUSTERED 索引存储实际的表数据,索引上的分页操作可能会更加昂贵,因为整行可能需要被移动(与仅键列和
NONCLUSTERED
索引中的ROWID
相比)。以下 MSDN 页面讨论了
FILLFACTOR
和页面拆分:http://msdn.microsoft.com/en-us /library/aa933139(SQL.80).aspx
主键与聚集唯一索引有何特别之处?
约束与索引有何不同?
对于这两者,我认为更多的是声明您的意图。当您将某些内容称为
PRIMARY KEY
时,您就声明它是识别给定行的主要方法。主键
在物理上与聚集唯一索引
不同吗?我不知道。行为本质上是相同的,但使用您的数据库的人可能不清楚您的意图。关于约束,约束有很多种。对于
UNIQUE CONSTRAINT
来说,除了声明您的意图之外,它与UNIQUE INDEX
之间没有真正的区别。还有其他类型的约束不直接映射到索引类型,例如 CHECK 约束、DEFAULT 约束和 FOREIGN KEY 约束。You have several questions. I'll break some of them out:
When creating Indexes why should I care about Clustered vs Non Clustered?
Sometimes you do care how the rows are organized. It depends on your data and how you will use it. For example, if your primary key is a
uniqueidentifier
, you may not want it to beCLUSTERED
, because GUID values are essentially random. This will cause SQL to insert rows randomly throughout the table, causing page splits which hurt performance. If your primary key value will always increment sequentially (int IDENTITY
for example), then you probably want it to beCLUSTERED
, so your table will always grow at the end.A primary key is
CLUSTERED
by default, and most of the time you don't have to worry about it.I was told that inserting and deleting are slow with Non-Clustered indexes as the tree needs to be "rebuilt." I take it Clustered indexes do not affect performance this way?
Actually, the opposite can be true.
NONCLUSTERED
indexes are kept as a separate data structure, but the structure is designed to allow some modification without needing to be "re-built". When the index is initially created, you can specify theFILLFACTOR
, which specifies how much free space to leave on each page of the index. This allows the index to tolerate some modification before a page split is necessary. Even when a page split must occur, it only affects the neighboring pages, not the entire index.The same behavior applies to
CLUSTERED
indexes, but sinceCLUSTERED
indexes store the actual table data, page splitting operations on the index can be much more expensive because the whole row may need to be moved (versus just the key columns and theROWID
in aNONCLUSTERED
index).The following MSDN page talks about
FILLFACTOR
and page splits:http://msdn.microsoft.com/en-us/library/aa933139(SQL.80).aspx
What is special about a Primary Key vs a Clustered Unique Index?
How are constraints different to Indexes?
For both of these I think it's more about declaring your intentions. When you call something a
PRIMARY KEY
you are declaring that it is the primary method for identifying a given row. Is aPRIMARY KEY
physically different from aCLUSTERED UNIQUE INDEX
? I'm not sure. The behavior is essentially the same, but your intentions may not be clear to someone working with your database.Regarding constraints, there are many types of constraints. For a
UNIQUE CONSTRAINT
, there isn't really a difference between that and aUNIQUE INDEX
, other than declaring your intention. There are other types of constraints that do not map directly to a type of index, such asCHECK
constraints,DEFAULT
constraints, andFOREIGN KEY
constraints.我没有时间深入回答这个问题,所以这里有一些我脑海中浮现的信息:
你对聚集索引的看法是正确的。它们根据聚集索引的排序顺序重新排列物理数据。您可以专门针对范围限制查询(例如日期之间)使用聚集索引。
默认情况下,PK 是集群的,但并非必须如此。这只是默认设置。 PK 应该是该行的 UID。
约束可以实现为索引(例如,唯一约束),但也可以实现为默认值。
I don't have time to answer this in depth, so here is some info off the top of my head:
You're right about clustered indexes. They rearrange the physical data according to the sort order of the clustered index. You can use clustered indexes specifically for range-bound queries (e.g. between dates).
PKs are by default clustered, but they don't have to be. That's just a default setting. The PK is supposed to be a UID for the row.
Constraints can be implemented as indexes (for example, unique constraints), but can also be implemented as default values.