SQL 表 PK 的数据类型如何影响查询性能?
SQL表的PK的数据类型如何影响查询性能?
具体来说,我感兴趣的是:
字符串数据类型之间有什么区别(例如
nvarchar(n)
、varchar(n)
)和数字数据类型(int
、bigint
、uniqueidentifier
)?不同字符串数据类型之间有什么区别?
字符串数据类型的最大长度如何影响性能?是否存在特定的
varchar
或nvarchar
长度导致性能急剧下降?不同数值数据类型之间有什么区别?
这些变化有何影响:
主键的相等性比较?
连接主键?
按主键更新?
按主键进行复杂值比较(例如,在
varchar
上使用LIKE
或在int 上使用
<=
)?
如果不同选项之间存在显着差异,那么,可以采取哪些措施来优化较慢数据类型的性能?
复合 PK 与其他选项相比如何?
更新:需要明确的是,我知道这是一个很长的问题,我并不是要求用勺子灌输所有这些信息。提供可靠在线资源链接的答案已经完全足够了,我可以在其中找到此信息。
更新 2:
我正在使用 SQL Server Express 2008。
How does the Data Type of an SQL table's PK impact query performance?
Specifically, I am interested in:
What is the difference between string datatypes (e.g.
nvarchar(n)
,varchar(n)
) and numeric datatypes (int
,bigint
,uniqueidentifier
)?What is the difference between the different string data types?
How does the maximum length of a string data type affect performance? Is there a specific
varchar
ornvarchar
length at which the performance sharply declines?What is the difference between the different numeric data types?
How do these variations impact:
Equality comparison of Primary Keys?
Joins on Primary Keys ?
Updates by Primary Key ?
Complex value comparisons by Primary Key (e.g. with
LIKE
on avarchar
or<=
on anint
)?
If there is a significant disparity between the different options, then, What measures can be taken to optimize performance with the slower data types?
How does a composite PK compare to the other options?
Update: To be clear, I understand this is a long question and I am not asking to be spoon-fed all this information. An answer that provides links to reliable online resources where I can find this information is completely sufficient.
Update 2:
I am using SQL Server Express 2008.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我没有任何硬性数字 - 但根据经验和我多年来学到的一切,我会说:
尝试使用固定长度密钥 -
INT< /code>、
BIGINT
、CHAR(x)
(对于 x <= 6 个字符) - 这些往往更容易处理,并且为 SQL Server 提供更少的开销与. 一起工作。避免较大的VARCHAR
值,因为 SQL Server 对每个索引条目有 900 字节的限制 - 甚至不要尝试使用
VARCHAR(MAX)
或其他离谱的值就像那样......由于 SQL Server 中的主键默认情况下是您的集群键,集群键的所有规则都将适用。 好的聚类关键是:
INT IDENTITY
是完美的),以减少由于索引结构中的页面拆分而导致的索引和页面碎片迄今为止关于 SQL Server 索引的最佳、最权威和最详尽的资源(以及什么)要做哪些事情以及做什么避免)将是Kimberly Tripp 的博客,尤其是她的索引类别。很棒的东西!
I don't have any hard numbers - but from experience and from everything I have learned over the years, I would say:
try to use a fixed-length key -
INT
,BIGINT
,CHAR(x)
(for x <= 6 characters) - those tend to be easier to deal with, and give SQL Server less overhead to work with. Avoid largerVARCHAR
valuessince SQL Server has a limitation on 900 bytes for each index entry - don't even try to use a
VARCHAR(MAX)
or something outrageous like that.....since the primary key in SQL Server is by default your clustering key, all those rules for the clustering key will apply. A good clustering key is:
INT IDENTITY
is perfect) to reduce the index and page fragmentation due to page splits in your index structuresBy far the best, most authoritative and most exhaustive resource on SQL Server indexing (and what kind of things to do and what to avoid) would be Kimberly Tripp's blog, especially her Indexes category. Great stuff !
数据类型越“窄”,意味着数据类型占用的字节数越小,性能就越好。
例如,INT一般占用4个字节。 VARCHAR(4) 在大多数数据库上也是如此,但 VARCHAR(5+) 使用的字节数比 INT 多。对于 VARCHAR(小于 4),反之亦然。重申一下:INT 和 VARCHAR(4) [大致] 等效,但 VARCHAR(小于 4) 会比使用更少(因此“更快”),而 VARCHAR(5+) 会比使用更多(因此“更慢”) INT。
老实说,我不会解决数据类型之间的差异,因为
The "more narrow" the data type is, meaning the smaller the amount of bytes the data type takes, the better the performance will be.
For example, INT generally takes 4 bytes. VARCHAR(4) does too on most databases, but VARCHAR(5+) uses more bytes than INT.. and vice versa for VARCHAR(less than 4). To re-iterate: INT and VARCHAR(4) are [roughly] equivalent, but VARCHAR(less than 4) would be less (therefore "faster") and VARCHAR(5+) would be more (therefore "slower") than using INT.
Honestly, I'm not going to address differences between data types because
我假设“主键”指的是表上的聚集索引,因为默认情况下它们在 SQL Server 中是相同的。
聚集索引的大小很重要,因为所有其他索引都将使用聚集索引来引用表中的各个行。因此,大的聚集索引会导致所有其他索引也很大。大型索引可能会损害性能,因为每个页面中的行数较少,并且工作集中的页面交换量较多。
因此,如果可以选择,您应该使用较小而不是较大的列或列集作为主键。
nvarchar
可以包含各种宽度的字符串。nchar
包含恒定的预定义宽度的字符串。 (还有varchar
和char
数据类型,它们是为了向后兼容而包含的,但应该避免使用它们,因为它们需要在数据与旧字符编码之间进行转换。被写入或阅读。)我强烈建议阅读 SQL Server 文档的数据类型回答你的其他问题。
I will assume that by "primary key" you are referring to the clustered index on the table, since by default they are the same thing in SQL Server.
The size of the clustered index is important, because all other indexes will use the clustered index to refer to individual rows within the table. Therefore, a large clustered index will cause all other indexes to be large. Large indexes can harm performance, because there are fewer rows in each page and more pages get swapped in an out of the working set.
Therefore, if given a choice you should use a smaller rather than a larger column or set of columns for the primary key.
nvarchar
can contain strings of various widths.nchar
contains strings of a constant, pre-defined width. (There are alsovarchar
andchar
data types which are included for backwards-compatability, but they should be avoided, since they require converting data to and from legacy character encodings whenever they are written or read.)I highly recomend reading the SQL Server documentation on data types for the answers to your other questions.