主键可以是聚集表上的非聚集索引吗?
我正在设计一个成员表来存储网站的用户。每次用户登录网站时都会使用它,并且偶尔会访问它来更新用户详细信息。
用户将使用电子邮件地址和密码登录,每个帐户都有一个唯一的电子邮件地址。因此,成员表的电子邮件列应该是它的聚集索引似乎是合乎逻辑的,因为当用户登录时,该表上的大多数查询将针对电子邮件列。使电子邮件列唯一并成为聚集索引的键,应该可以在用户登录时快速查询用户数据并提高性能。
但据我了解,将电子邮件列设为主键是错误的,原因有两个。第一,主键应该是不变的,因此如果用户决定更改他们的电子邮件地址,那么所有外键都必须更新,这会很糟糕。其次,电子邮件地址是字符串,这会使连接速度比 PK 为整数时慢。
那么我可以将非聚集索引作为主键吗?那么该表既有一个以 Email 作为唯一键的聚集索引,又有一个 int 主键作为顶部的非聚集索引?
谢谢, 邓肯
I am designing a Members Table to store the users of a website. It will be used every time a user logs on to the website and occasionally accessed to update user details.
The users will log on with an email address and password and every account will have a unique email address. Therefore it seems logical that the Email column of the Members Table should be it's clustered index as the majority of queries on this table will be against the Email column as users log on. Making the Email column unique and the key to the clustered index should make querying user's data as they log on fast and improve performance.
But as I understand it, it would be wrong to make the Email column the Primary Key for two reasons. One, a Primary Key should be constant, so if a user decided to change their email address then all foreign keys would have to be updated and that would be bad. Secondly email addresses are strings which would make Joins slower than if the PK was an int.
So can I make a Non Clustered Index the Primary Key? So that the table has both a Clustered Index with Email as it's unique key, and an int primary key as a Non Clustered index on top?
Thanks,
Duncan
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
主键是一种逻辑数据库设计,只需唯一且非NULL(通过索引实现)。
此外,您还可以选择单个聚集索引,该索引应该是窄的、唯一的、递增的和静态的(电子邮件可能不适合这样做)。
我会创建一个 IDENTITY int 主键并在其上进行集群。
我会在电子邮件上添加一个唯一的非聚集索引并“包含”附加列,以便覆盖您最频繁的繁重查询(即密码哈希)。请注意,您不需要将聚集键添加到包含的列,因为它始终作为书签包含在非聚集索引中)。
查看执行计划以确保您没有在用户表中看到任何表扫描或聚集索引扫描。
我想补充一点,人们通常认为看到查询使用聚集索引是一件好事。我认为,在索引覆盖的查询中使用的非聚集索引扫描或查找在堆(没有聚集索引的表)上与聚集索引上一样好,并且比聚集索引扫描或查找更好。我还认为,聚集索引这个名称会导致人们对事物产生各种假设(首先,它实际上并不是表上的索引,它表明表完全存储在索引结构中)和误解关于它的重要性。聚集索引在非常大的操作中最重要,因为在这些操作中需要按聚集顺序排列大量数据。
典型 OLTP 查询的实际(读取)查询速度来自于在查询中的所有表上使用尽可能窄的非聚集索引覆盖查询,并且每列都按适当的顺序以及查询/参数的正确排序方向。
Primary key is a logical database design and only has to be unique and non-NULL (which is implemented with an index).
In addition, you have a choice of a single clustered index, which should be narrow, unique, increasing and static (email is probably NOT good for this).
I would make an IDENTITY int primary key and cluster on that.
I would add a unique non-clustered index on email and "include" additional columns so that your most frequent heavy queries become covering (i.e. the password hash). Note that you should not need to add the clustered key to the included columns, since that is always included as the bookmark in the non-clustered index).
Look at the execution plans to ensure that you are not seeing any table scans or clustered index scans in the user table.
I would add that typically people think that seeing queries use a clustered index is a good thing. I would argue that a non-clustered index scan or seek used in a query where the indexes are covering is just as good on a heap (a table without a clustered index) as on a clustered index and better than a clustered index scan or seek. I would also argue that a clustered index is a name which leads people to all kinds of assumptions about things (to start with, it's not really an index on a table, it indicates that the table is stored completely in the index structure) and misconceptions about its importance. Clustered indexes are most important in very large operations where a large amount of data is needed in the order of clustering.
Real (read) query speed on typical OLTP queries comes from covering the query with the narrowest possible non-clustered indexes on all the tables in the query with every column in the appropriate order and correct sort direction for the query/parameters.
您绝对可以在主键中创建非聚集索引。
然而,我认为你有点倒退了。
电子邮件地址将形成一个特别糟糕的聚集索引,因为它本质上不是有序的。随着表的增长,由于页面分割、重新排序等,您将失去
INSERT
性能。正如 @Cade Roux 所说,我会将 autonum 设为聚集索引,强制电子邮件的唯一性地址。
编辑:聚集索引表示数据如何物理存储在磁盘上。非顺序聚集索引会损害 INSERT 性能,因为数据必须重新排序(导致页面拆分)。
对于扫描用户表中的单行,您可能会发现聚集索引和非聚集索引之间的差异可以忽略不计。然而,根据 @gbn 发布的优秀链接,您可能在范围选择上有更好的性能,因为数据是连续的。尽管如此,我个人必须认真考虑是否决定使用字符串(或任何本质上无序的数据)作为聚集索引。
EDIT2:我能想到的一个例外情况是,如果您通常按电子邮件地址按字母顺序选择用户块...您的
INSERT
仍然会较慢,但是你应该能够更快地检索这些分组...正如 @Cade Roux 在评论中所说:你不应该期望单行 SELECT 的性能更高,因为到聚集索引。You absolutely can make a non-clustered index into the primary key.
However, I contend that you have it backwards a bit.
The e-mail address will make a particularly BAD clustered index because it's not inherently ordered. As the table grows you will lose
INSERT
performance due to page-splits, re-ordered, etc.As @Cade Roux said, I would make the autonum the clustered index, enforce uniqueness on the e-mail address.
EDIT: The clustered index represents how data is physically stored on disk. Non-sequential clustered indices will hurt
INSERT
performance as data must be re-ordered (causing page-splits).For scanning for single rows in a users table, you will likely find negligible difference between a clustered and non-clustered index. However, pursuant to the excellent link posted by @gbn, you might have better performance on range selects because the data is sequential. Nonetheless, I would personally have to really ponder a decision to use strings (or any inherently un-ordered data) for a clustered index.
EDIT2: One exceptional case I can think of would be if you commonly select chunks of users by e-mail address alphabetically... you'd still have slower
INSERT
s but you should be able to retrieve these groupings faster... As @Cade Roux said in the comments though: you should not expect single-rowSELECT
s to be more performant due to the clustered index.是的,你可以。创建表时,设置列如下:
Yes you can. When you create the table, set the columns like this: