SQL 排序规则影响性能

发布于 2024-11-09 16:39:53 字数 371 浏览 3 评论 0原文

我只想检查一些事情:

Q1)Latin1_General_CI_AS 不区分大小写,区分重音:即 SQL 会将以下内容视为相等 - “hello”和“HELLO”

使用 LINQ 我经常这样做:

db.Where(v => v.Email == "some email".ToLower())

Q2)假设我对 Q1 的理解是正确的,我只是在查询中调用 ToLower() 浪费处理时间吗?

Q3) 有谁知道使用 Latin1_General_bin 相对于 Latin1_General_CI_AS 是否会有性能改进?即已经在博客等上完成了性能测试(在我写这篇文章时想到了这一点,所以还没有看我自己)

I would just like to check a few things:

Q1) Latin1_General_CI_AS is Case Insensitive, Accent Sensitive : I.e. SQL will see the following as equal - "hello" and "HELLO"

With LINQ I quiet often do:

db.Where(v => v.Email == "some email".ToLower())

Q2) Assuming my understanding to Q1 is correct, am I just wasting processing time calling ToLower() in my queries?

Q3) Does anybody know if there would be a performance improvement in using Latin1_General_bin over Latin1_General_CI_AS? I.e have there already been performance tests done on a blog etc (thought of this as I was writing the post, so not looked my self yet)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

笑红尘 2024-11-16 16:39:53

一般来说,SQL 比较不区分大小写。
不过也有例外,例如在 MySQL 中,如果使用 binary varchar 比较将区分大小写。

所以你的 ToLower 可能并不完全是浪费时间。

Latin1_General_bin 区分大小写。
Latin1_General_CI_AS 则不然。

区分大小写的比较在数据库中会更快,但是如果您想要将“某些电子邮件”与“某些电子邮件”匹配,您将付出一定的代价,您将不得不转换为小写,从而失去所有速度增益。
我没有计时,但我认为不值得这么麻烦。
我建议在进行微优化之前明智地使用索引和查询。

-- 过早的优化是万恶之源,Donald Knuth。

In general SQL comparisons are case insensitive.
However there are exceptions, e.g. in MySQL if you use a binary varchar comparisons will be case sensitive.

So your ToLower might not be a complete waste of time.

The Latin1_General_bin is case sensitive.
Whereas the Latin1_General_CI_AS is not.

Case sensitive comparison will be faster in the database, but you pay a price if you want to match "some email" to "Some email" you will have to cast to lowercase, losing all that speed gain.
I haven't timed it but I don't think it is worth the hassle.
I recommend smart use of indexes and queries before this micro-optimizations.

-- Premature optimization is the root of all evil, Donald Knuth.

独﹏钓一江月 2024-11-16 16:39:53

实际例子中的表现:
表Adres 包含320K 行数据。当我们有电子邮件时,我们需要 Adres.Id(如您的示例中所示)。

数据库(和表 Adres)排序规则为 SQL_Latin1_General_CP1_CI_AS

为了优化性能,在 Email 列(包括 Adres.Id 列)上创建非聚集索引

查询如下:

SELECT  Adres.ID,Email FROM  csc.Adres WHERE EMAIL ='[email protected]'

SELECT  Adres.ID,Email FROM  csc.Adres WHERE EMAIL='[email protected]' COLLATE Latin1_General_bin

返回 1 行

每个查询结果 :在此处输入图像描述

似乎在第二种情况下,查询没有被 SQL Server 识别为 SARG。为什么?让我们看看细节。
在第一种情况下:

 ScalarOperator ScalarString="CONVERT_IMPLICIT(nvarchar(4000),[@1],0)

在第二种情况下:

ScalarOperator ScalarString="CONVERT_IMPLICIT(nvarchar(80),[CSCENTRUMTest].[csc].[Adres].[Email],0)=CONVERT_IMPLICIT(nvarchar(4000),CONVERT(varchar(8000),[@1],0),0)">

因此在第二种情况下电子邮件将转换为所需的排序规则。这种情况不是 SARG 并且执行了索引扫描。

如果查询无法识别为 SARG(例如 LIKE '%some email%)',计划是相同的。

假设:如果您的查询可以像 SARG 一样被识别,并且您有适当的索引,则首选无排序规则(最好在客户端/服务端进行排序规则对话)。

您可以在不同的性能调优书籍/文章中找到 SARG 信息。

Performance on real example:
Table Adres consists 320K rows of data. We need Adres.Id when we have Email (like in your example).

Database (and table Adres) collation is SQL_Latin1_General_CP1_CI_AS

For performance optimization, non-clustered index was created on column Email(Adres.Id column is included)

Queryies look like:

SELECT  Adres.ID,Email FROM  csc.Adres WHERE EMAIL ='[email protected]'

SELECT  Adres.ID,Email FROM  csc.Adres WHERE EMAIL='[email protected]' COLLATE Latin1_General_bin

1 row was returned for every query

results:enter image description here

It seems that in second case query is not being identified as SARG by SQL Server. Why? Let us look at details.
In first case:

 ScalarOperator ScalarString="CONVERT_IMPLICIT(nvarchar(4000),[@1],0)

And in second:

ScalarOperator ScalarString="CONVERT_IMPLICIT(nvarchar(80),[CSCENTRUMTest].[csc].[Adres].[Email],0)=CONVERT_IMPLICIT(nvarchar(4000),CONVERT(varchar(8000),[@1],0),0)">

So in second case Email is converted to required collation. This case is not SARG and index scan was performed.

If queries can not be identified as SARG (e.g. LIKE '%some email%)', plans are the same.

Assuming: if your query can be identified like SARG and you have appropriate index, no-collation is preferred (it is better to do collation conversation on client/service side).

You can find SARG information in different performance tuning books/articles.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文