SQL 排序规则影响性能

发布于 2024-11-09 16:39:53 字数 371 浏览 3 评论 0原文

我只想检查一些事情：

Q1）Latin1_General_CI_AS 不区分大小写，区分重音：即 SQL 会将以下内容视为相等 - “hello”和“HELLO”

使用 LINQ 我经常这样做：

db.Where(v => v.Email == "some email".ToLower())

Q2）假设我对 Q1 的理解是正确的，我只是在查询中调用 ToLower() 浪费处理时间吗？

Q3) 有谁知道使用 Latin1_General_bin 相对于 Latin1_General_CI_AS 是否会有性能改进？即已经在博客等上完成了性能测试（在我写这篇文章时想到了这一点，所以还没有看我自己）

原文

I would just like to check a few things:

Q1) Latin1_General_CI_AS is Case Insensitive, Accent Sensitive : I.e. SQL will see the following as equal - "hello" and "HELLO"

With LINQ I quiet often do:

db.Where(v => v.Email == "some email".ToLower())

Q2) Assuming my understanding to Q1 is correct, am I just wasting processing time calling ToLower() in my queries?

Q3) Does anybody know if there would be a performance improvement in using Latin1_General_bin over Latin1_General_CI_AS? I.e have there already been performance tests done on a blog etc (thought of this as I was writing the post, so not looked my self yet)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

笑红尘 2024-11-16 16:39:53

一般来说，SQL 比较不区分大小写。
不过也有例外，例如在 MySQL 中，如果使用 binary varchar 比较将区分大小写。

所以你的 ToLower 可能并不完全是浪费时间。

Latin1_General_bin 区分大小写。
而 Latin1_General_CI_AS 则不然。

区分大小写的比较在数据库中会更快，但是如果您想要将“某些电子邮件”与“某些电子邮件”匹配，您将付出一定的代价，您将不得不转换为小写，从而失去所有速度增益。
我没有计时，但我认为不值得这么麻烦。
我建议在进行微优化之前明智地使用索引和查询。

-- 过早的优化是万恶之源，Donald Knuth。

回复收藏 0 原文

独﹏钓一江月 2024-11-16 16:39:53

实际例子中的表现：
表Adres 包含320K 行数据。当我们有电子邮件时，我们需要 Adres.Id（如您的示例中所示）。

数据库（和表 Adres）排序规则为 SQL_Latin1_General_CP1_CI_AS

为了优化性能，在 Email 列（包括 Adres.Id 列）上创建非聚集索引

查询如下：

SELECT  Adres.ID,Email FROM  csc.Adres WHERE EMAIL ='[email protected]'

SELECT  Adres.ID,Email FROM  csc.Adres WHERE EMAIL='[email protected]' COLLATE Latin1_General_bin

返回 1 行

每个查询结果：在此处输入图像描述

似乎在第二种情况下，查询没有被 SQL Server 识别为 SARG。为什么？让我们看看细节。
在第一种情况下：

 ScalarOperator ScalarString="CONVERT_IMPLICIT(nvarchar(4000),[@1],0)

在第二种情况下：

ScalarOperator ScalarString="CONVERT_IMPLICIT(nvarchar(80),[CSCENTRUMTest].[csc].[Adres].[Email],0)=CONVERT_IMPLICIT(nvarchar(4000),CONVERT(varchar(8000),[@1],0),0)">

因此在第二种情况下电子邮件将转换为所需的排序规则。这种情况不是 SARG 并且执行了索引扫描。

如果查询无法识别为 SARG（例如 LIKE '%some email%）'，计划是相同的。

假设：如果您的查询可以像 SARG 一样被识别，并且您有适当的索引，则首选无排序规则（最好在客户端/服务端进行排序规则对话）。

您可以在不同的性能调优书籍/文章中找到 SARG 信息。

Performance on real example:
Table Adres consists 320K rows of data. We need Adres.Id when we have Email (like in your example).

Database (and table Adres) collation is SQL_Latin1_General_CP1_CI_AS

For performance optimization, non-clustered index was created on column Email(Adres.Id column is included)

Queryies look like:

SELECT  Adres.ID,Email FROM  csc.Adres WHERE EMAIL ='[email protected]'

SELECT  Adres.ID,Email FROM  csc.Adres WHERE EMAIL='[email protected]' COLLATE Latin1_General_bin

1 row was returned for every query

results: enter image description here

It seems that in second case query is not being identified as SARG by SQL Server. Why? Let us look at details.
In first case:

 ScalarOperator ScalarString="CONVERT_IMPLICIT(nvarchar(4000),[@1],0)

And in second:

ScalarOperator ScalarString="CONVERT_IMPLICIT(nvarchar(80),[CSCENTRUMTest].[csc].[Adres].[Email],0)=CONVERT_IMPLICIT(nvarchar(4000),CONVERT(varchar(8000),[@1],0),0)">

So in second case Email is converted to required collation. This case is not SARG and index scan was performed.

If queries can not be identified as SARG (e.g. LIKE '%some email%)', plans are the same.

Assuming: if your query can be identified like SARG and you have appropriate index, no-collation is preferred (it is better to do collation conversation on client/service side).

You can find SARG information in different performance tuning books/articles.

回复收藏 0 原文

~没有更多了~