SQL Server 不区分大小写的排序规则

发布于 2024-10-03 03:07:37 字数 123 浏览 0 评论 0 原文

在 SQL Server 中使用不区分大小写的排序规则有哪些优点/缺点(就查询性能而言)?

我有一个数据库当前正在使用不区分大小写的排序规则,但我不太喜欢它。我非常想将其更改为区分大小写。更改排序规则时应该注意什么?

What are the benefits/drawbacks of using a case insensitive collation in SQL Server (in terms of query performance)?

I have a database that is currently using a case-insensitive collation, and I don't really like it. I would very much like to change it to case sensitive. What should I be aware of when changing the collation?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

〗斷ホ乔殘χμё〖 2024-10-10 03:07:37

如果更改数据库上的排序规则,则还必须单独更改每个列上的排序规则 - 它们维护创建表时有效的排序规则设置。

create database CollTest COLLATE Latin1_General_CI_AI
go
use CollTest
go
create table T1 (
    ID int not null,
    Val1 varchar(50) not null
)
go
select name,collation_name from sys.columns where name='Val1'
go
alter database CollTest COLLATE Latin1_General_CS_AS
go
select name,collation_name from sys.columns where name='Val1'
go

结果:

name collation_name
---- --------------
Val1 Latin1_General_CI_AI

name collation_name
---- --------------
Val1 Latin1_General_CI_AI

If you change the collation on the database, you also have to change it on each column individually - they maintain the collation setting that was in force when their table was created.

create database CollTest COLLATE Latin1_General_CI_AI
go
use CollTest
go
create table T1 (
    ID int not null,
    Val1 varchar(50) not null
)
go
select name,collation_name from sys.columns where name='Val1'
go
alter database CollTest COLLATE Latin1_General_CS_AS
go
select name,collation_name from sys.columns where name='Val1'
go

Result:

name collation_name
---- --------------
Val1 Latin1_General_CI_AI

name collation_name
---- --------------
Val1 Latin1_General_CI_AI
匿名的好友 2024-10-10 03:07:37

(我将其添加为单独的答案,因为它与我的第一个答案有很大不同。)
好的,找到了一些实际的文档。这篇 MS 知识库文章指出,不同排序规则之间存在性能差异,但不是你想的那样。区别在于SQL 排序规则(向后兼容,但不支持unicode)和Windows 排序规则(支持unicode):

一般来说,Windows 和 SQL 排序规则之间的性能差异程度不会很大。仅当工作负载受 CPU 限制而不是受 I/O 或网络速度限制时,才会出现这种差异,并且大部分 CPU 负担是由 SQL Server 中执行的字符串操作或比较的开销造成的。

SQL 和 Windows 排序规则都有区分大小写和不区分大小写的版本,因此听起来这不是主要问题。

Dan 的优秀文章中的另一个好故事“来自战壕”,标题为“整理地狱":

我继承了一个混合排序规则环境,其中的排序规则多得我一只手都数不过来。不同的排序规则需要解决方法来避免“无法解决排序规则冲突”错误,并且这些解决方法会因不可控制的表达式而降低性能。处理混合排序规则确实很痛苦,因此我强烈建议您对单一排序规则进行标准化,只有在仔细深思熟虑后才可以进行偏离。

他的结论是:

我个人认为在选择正确的排序规则时甚至不应该考虑性能。我生活在排序规则地狱中的原因之一是我的前任选择了二进制排序规则来为我们的高度事务性 OLTP 系统提供每一点性能。除了领先的通配符表扫描搜索之外,我发现不同排序规则没有可测量的性能差异。性能的真正关键是查询和索引调整而不是排序规则。如果性能对您很重要,我建议您在根据性能预期选择排序规则之前,对实际应用程序查询执行性能测试。

希望这有帮助。

(I added this as a separate answer because its substantially different than my first.)
Ok, found some actual documentation. This MS KB article says that there are performance differences between different collations, but not where you think. The difference is between SQL collations (backward compatible, but not unicode aware) and Windows collations (unicode aware):

Generally, the degree of performance difference between the Windows and the SQL collations will not be significant. The difference only appears if a workload is CPU-bound, rather than being constrained by I/O or by network speed, and most of this CPU burden is caused by the overhead of string manipulation or comparisons performed in SQL Server.

Both SQL and Windows collations have case sensitive and case insensitive versions, so it sounds like that isn't the primary concern.

Another good story "from the trenches" in Dan's excellent article titled "Collation Hell":

I inherited a mixed collation environment with more collations than I can count on one hand. The different collations require workarounds to avoid "cannot resolve collation conflict" errors and those workarounds kill performance due to non-sargable expressions. Dealing with mixed collations is a real pain so I strongly recommend you standardize on a single collation and deviate only after careful forethought.

He concludes:

I personally don't think performance should even be considered in choosing the proper collation. One of the reasons I'm living in collation hell is that my predecessors chose binary collations to eke out every bit of performance for our highly transactional OLTP systems. With the sole exception of a leading wildcard table scan search, I've found no measurable performance difference with our different collations. The real key to performance is query and index tuning rather than collation. If performance is important to you, I recommend you perform a performance test with your actual application queries before you choose a collation on based on performance expectations.

Hope this helps.

往事随风而去 2024-10-10 03:07:37

我想说,在生产数据库中更改为区分大小写的排序规则的最大缺点是,许多(如果不是大多数)查询都会失败,因为它们当前被设计为忽略大小写。

我没有尝试更改现有数据库的排序规则,但我怀疑这也可能非常耗时。当这个过程发生时,您可能必须将您的用户完全锁定。除非您已经在开发人员上进行了彻底的测试,否则请勿尝试此操作。

I would say the biggest drawback to changing to a case sensitive collation in a production database would be that many, if not most, of your queries would fail because they are currently designed to ignore case.

I've not tried to change collation on an existing datbase, but I suspect it could be quite time consuming to do as well. You probably will have to lock your users out completely while the process happens too. Do not try this unless you have thoroughly tested on dev.

乱了心跳 2024-10-10 03:07:37

我找不到任何东西来确认正确构建的查询在区分大小写的数据库上与不区分大小写的数据库上是否运行得更快(尽管我怀疑差异可以忽略不计),但有几件事对我来说很清楚:

  1. 如果您的业务需求没有要求它,那么您就会承担大量额外的工作(这是 HLGEM 和 Damien_The_Unknowner 答案的关键)。
  2. 如果您的业务需求没有要求它,那么您就会陷入许多可能的错误。
  3. 如果需要区分大小写的查找,那么在不区分大小写的数据库中构建性能不佳的查询太容易了:

像这样的查询:

... WHERE UPPER(GivenName) = 'PETER'

不会在 GiveName 上使用索引。你可能会想:

... WHERE GivenName = 'PETER' COLLATE SQL_Latin1_General_CP1_CS_AS

会更好,而且确实如此。但为了获得最佳性能,您必须执行以下操作:(

... WHERE GivenName = 'PETER' COLLATE SQL_Latin1_General_CP1_CS_AS
    AND GivenName LIKE 'PETER'

请参阅 这篇文章了解详情)

I can't find anything to confirm whether properly constructed queries work faster on a case-sensitive vs case-insensitive database (although I suspect the difference is negligible), but a few things are clear to me:

  1. If your business requirements don't ask for it, you are putting yourself up to a lot of extra work (this is the crux of both HLGEM and Damien_The_Unbeliever's answers).
  2. If your business requirements don't ask for it, you are setting yourself up for a lot of possible errors.
  3. Its way too easy to construct poorly performing queries in a case-insensitive database if a case sensitive lookup is required:

A query like:

... WHERE UPPER(GivenName) = 'PETER'

won't use an index on GivenName. You would think something like:

... WHERE GivenName = 'PETER' COLLATE SQL_Latin1_General_CP1_CS_AS

would work better, and it does. But for maximum performance you'd have to do something like:

... WHERE GivenName = 'PETER' COLLATE SQL_Latin1_General_CP1_CS_AS
    AND GivenName LIKE 'PETER'

(see this article for the details)

一笔一画续写前缘 2024-10-10 03:07:37

如果更改数据库排序规则但未更改服务器排序规则(结果它们不匹配),请在使用临时表时小心。除非在 CREATE 语句中另有指定,否则它们将使用服务器的默认排序规则而不是数据库的默认排序规则,这可能会导致与数据库列进行 JOIN 或其他比较(假设它们也更改为数据库的排序规则,如 Damien_The_Un believeer 所暗示的那样)失败。

If you change the database collation but not the server collation (and they then don't match as a result), watch out when using temporary tables. Unless otherwise specified in their CREATE statement, they will use the server's default collation rather than that of the database which may cause JOINs or other comparisons against your DB's columns (assuming they're also changed to the DB's collation, as alluded to by Damien_The_Unbeliever) to fail.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文