SQL 排序规则影响性能
我只想检查一些事情:
Q1)Latin1_General_CI_AS 不区分大小写,区分重音:即 SQL 会将以下内容视为相等 - “hello”和“HELLO”
使用 LINQ 我经常这样做:
db.Where(v => v.Email == "some email".ToLower())
Q2)假设我对 Q1 的理解是正确的,我只是在查询中调用 ToLower() 浪费处理时间吗?
Q3) 有谁知道使用 Latin1_General_bin 相对于 Latin1_General_CI_AS 是否会有性能改进?即已经在博客等上完成了性能测试(在我写这篇文章时想到了这一点,所以还没有看我自己)
I would just like to check a few things:
Q1) Latin1_General_CI_AS is Case Insensitive, Accent Sensitive : I.e. SQL will see the following as equal - "hello" and "HELLO"
With LINQ I quiet often do:
db.Where(v => v.Email == "some email".ToLower())
Q2) Assuming my understanding to Q1 is correct, am I just wasting processing time calling ToLower() in my queries?
Q3) Does anybody know if there would be a performance improvement in using Latin1_General_bin over Latin1_General_CI_AS? I.e have there already been performance tests done on a blog etc (thought of this as I was writing the post, so not looked my self yet)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
一般来说,SQL 比较不区分大小写。
不过也有例外,例如在 MySQL 中,如果使用
binary
varchar 比较将区分大小写。所以你的 ToLower 可能并不完全是浪费时间。
Latin1_General_bin
区分大小写。而
Latin1_General_CI_AS
则不然。区分大小写的比较在数据库中会更快,但是如果您想要将“某些电子邮件”与“某些电子邮件”匹配,您将付出一定的代价,您将不得不转换为小写,从而失去所有速度增益。
我没有计时,但我认为不值得这么麻烦。
我建议在进行微优化之前明智地使用索引和查询。
-- 过早的优化是万恶之源,Donald Knuth。
In general SQL comparisons are case insensitive.
However there are exceptions, e.g. in MySQL if you use a
binary
varchar comparisons will be case sensitive.So your ToLower might not be a complete waste of time.
The
Latin1_General_bin
is case sensitive.Whereas the
Latin1_General_CI_AS
is not.Case sensitive comparison will be faster in the database, but you pay a price if you want to match "some email" to "Some email" you will have to cast to lowercase, losing all that speed gain.
I haven't timed it but I don't think it is worth the hassle.
I recommend smart use of indexes and queries before this micro-optimizations.
-- Premature optimization is the root of all evil, Donald Knuth.
实际例子中的表现:
表Adres 包含320K 行数据。当我们有电子邮件时,我们需要 Adres.Id(如您的示例中所示)。
数据库(和表 Adres)排序规则为 SQL_Latin1_General_CP1_CI_AS
为了优化性能,在 Email 列(包括 Adres.Id 列)上创建非聚集索引
查询如下:
返回 1 行
每个查询结果 :
似乎在第二种情况下,查询没有被 SQL Server 识别为 SARG。为什么?让我们看看细节。
在第一种情况下:
在第二种情况下:
因此在第二种情况下电子邮件将转换为所需的排序规则。这种情况不是 SARG 并且执行了索引扫描。
如果查询无法识别为 SARG(例如
LIKE '%some email%)
',计划是相同的。假设:如果您的查询可以像 SARG 一样被识别,并且您有适当的索引,则首选无排序规则(最好在客户端/服务端进行排序规则对话)。
您可以在不同的性能调优书籍/文章中找到 SARG 信息。
Performance on real example:
Table Adres consists 320K rows of data. We need Adres.Id when we have Email (like in your example).
Database (and table Adres) collation is SQL_Latin1_General_CP1_CI_AS
For performance optimization, non-clustered index was created on column Email(Adres.Id column is included)
Queryies look like:
1 row was returned for every query
results:
It seems that in second case query is not being identified as SARG by SQL Server. Why? Let us look at details.
In first case:
And in second:
So in second case Email is converted to required collation. This case is not SARG and index scan was performed.
If queries can not be identified as SARG (e.g.
LIKE '%some email%)
', plans are the same.Assuming: if your query can be identified like SARG and you have appropriate index, no-collation is preferred (it is better to do collation conversation on client/service side).
You can find SARG information in different performance tuning books/articles.