为什么 IsNull 比合并慢两倍（相同查询）？

发布于 2024-11-10 02:39:31 字数 902 浏览 4 评论 0原文

我们在 SQL Server 2008 (SP1) - 10.0.2531.0 (X64) - Win2008 SP2 (X64) 上遇到了奇怪的情况。

这是一个繁重的查询：

select t1.id, t2.id 
from t1, t2
where 
     t1.id = t2.ext_id
     and isnull(t1.vchCol1, 'Null') = isnull(t2.vchCol1, 'Null')
     and isnull(t1.vchCol2, 'Null') = isnull(t2.vchCol2, 'Null')
     .... and about 10 more comparisons with Isnull

UPD：比较中的所有列（ID 除外）均为 varchar(~30...200)
T1 约为 1.3 亿行，T2 约为 30 万行。

这些查询在相当大的开发服务器上运行〜5小时 - 这很慢，但我们能做什么？

虽然我们研究了可能的优化方法 - 我们发现，在上面的查询中将 “isnull” 更改为 “coalesce” 可带来双倍的性能增益 - 并且查询现在运行 >~2 小时

UPD：当我们删除所有 ISNULL 检查并仅使用 t1.vchCol1 = t2.vchCol1 时，查询完成40 分钟后。

问题是：这是已知行为吗？我们应该避免在任何地方使用IsNull吗？

原文

We met a strange situation on SQL Server 2008 (SP1) - 10.0.2531.0 (X64) - Win2008 SP2 (X64).

Here is a one heavy query:

select t1.id, t2.id 
from t1, t2
where 
     t1.id = t2.ext_id
     and isnull(t1.vchCol1, 'Null') = isnull(t2.vchCol1, 'Null')
     and isnull(t1.vchCol2, 'Null') = isnull(t2.vchCol2, 'Null')
     .... and about 10 more comparisons with Isnull

UPD: All columns in comparison (except IDs) are varchar(~30...200)
T1 is ~130mln rows, T2 is ~300k rows.

These query on rather big Dev server run ~5 hours - this is slow, but what we can do?

And while we investigated possible ways of optimisation - we found, that changing "isnull" to "coalesce" in query above gives double performance gain - and query now run for ~2 hours

UPD: When we remove all ISNULL checks and use just t1.vchCol1 = t2.vchCol1 the query finishes after 40mins.

Question is: Is this known behavior and we should avoid using IsNull everywhere?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

萌酱 2024-11-17 02:39:31

我想知道通过明确地拆分案例您是否会看到改进：

...
AND ((t1.vchCol1 = t2.vchCol1) OR (t1.vchCol1 IS NULL AND t2.vchCol1 IS NULL))
AND ((t1.vchCol2 = t2.vchCol2) OR (t1.vchCol2 IS NULL AND t2.vchCol2 IS NULL))
...

I wonder if you'd see an improvement by splitting the cases out explicitly:

...
AND ((t1.vchCol1 = t2.vchCol1) OR (t1.vchCol1 IS NULL AND t2.vchCol1 IS NULL))
AND ((t1.vchCol2 = t2.vchCol2) OR (t1.vchCol2 IS NULL AND t2.vchCol2 IS NULL))
...

回复收藏 0 原文

隱形的亼 2024-11-17 02:39:31

您会发现的有关该主题的大多数文章似乎都与此相矛盾。 ISNULL 比 COALESCE （稍微）快。

之间的差异ISNULL 和 COALESCE

COALESCE 基本上翻译为 CASE
表达式和 ISNULL 是内置的
在数据库引擎中实现。
...
这将产生一个性能
与 COALESCE 的差异和查询
这里的情况往往更糟。

ISNULL 与. 合并

我在一个
几个不同的服务器，以及 ISNULL
看起来相当一致
平均优于 COALESCE
10% 或 12%。但那就是
6秒和5.3秒的区别
秒（近似平均值
我的服务器上每次测试的运行时间），超过
一百万次例外的过程。
几乎不值得的功能和
标准合规性牺牲，在
至少在我使用这些的场景中
函数为。

COALESCE 与 ISNULL代码> vs <代码>IS NULL OR

表现最好的是IS NULL OR情况，
而这三个之间的区别
它们是次要的。

回复收藏 0 原文

自找没趣 2024-11-17 02:39:31

您可能需要考虑向每个保存校验和值的表添加一个计算列。然后，在ID列和校验和值上创建索引，最后在连接中使用校验和值。像这样的事情：

Alter Table T1 Add CheckSumId As CHECKSUM(vchCol1, vchCol2, vchCol3)
Alter Table T2 Add CheckSumId As CHECKSUM(vchCol1, vchCol2, vchCol3)

Create NonClustered index idx_T1_Checksum On T1(id, CheckSumId)
Create NonClustered index idx_T2_Checksum On T2(ext_id, CheckSumId)

那么你的查询将变成...

select t1.id, t2.id 
from t1 Inner Join t2
       On t1.id = t2.ext_id
       And T1.CheckSumId = T2.CheckSumId
where  isnull(t1.vchCol1, 'Null') = isnull(t2.vchCol1, 'Null')
     and isnull(t1.vchCol2, 'Null') = isnull(t2.vchCol2, 'Null')

这当然会使用额外的索引空间，但它只是 2 个整数，应该非常有效。每次插入、更新和删除也会有性能损失，因为需要维护另一个索引。但是，我怀疑这会对性能产生很大影响。

You may want to consider adding a computed column to each table that holds a checksum value. Then, create an index on the ID column and the checksum value, and finally use the checksum value in the join. Something like this:

Alter Table T1 Add CheckSumId As CHECKSUM(vchCol1, vchCol2, vchCol3)
Alter Table T2 Add CheckSumId As CHECKSUM(vchCol1, vchCol2, vchCol3)

Create NonClustered index idx_T1_Checksum On T1(id, CheckSumId)
Create NonClustered index idx_T2_Checksum On T2(ext_id, CheckSumId)

Then your query would become...

select t1.id, t2.id 
from t1 Inner Join t2
       On t1.id = t2.ext_id
       And T1.CheckSumId = T2.CheckSumId
where  isnull(t1.vchCol1, 'Null') = isnull(t2.vchCol1, 'Null')
     and isnull(t1.vchCol2, 'Null') = isnull(t2.vchCol2, 'Null')

This will, of course, use extra index space, but it's simply 2 integers which should be very efficient. There will also be a performance penalty for each insert, update and delete because another index needs to be maintained. However, I suspect this will have a large impact on performance.

回复收藏 0 原文