为什么 IsNull 比合并慢两倍(相同查询)?

发布于 2024-11-10 02:39:31 字数 902 浏览 4 评论 0原文

我们在 SQL Server 2008 (SP1) - 10.0.2531.0 (X64) - Win2008 SP2 (X64) 上遇到了奇怪的情况。

这是一个繁重的查询:

select t1.id, t2.id 
from t1, t2
where 
     t1.id = t2.ext_id
     and isnull(t1.vchCol1, 'Null') = isnull(t2.vchCol1, 'Null')
     and isnull(t1.vchCol2, 'Null') = isnull(t2.vchCol2, 'Null')
     .... and about 10 more comparisons with Isnull

UPD:比较中的所有列(ID 除外)均为 varchar(~30...200)
T1 约为 1.3 亿行,T2 约为 30 万行。

这些查询在相当大的开发服务器上运行〜5小时 - 这很慢,但我们能做什么?

虽然我们研究了可能的优化方法 - 我们发现,在上面的查询中将 “isnull” 更改为 “coalesce” 可带来双倍的性能增益 - 并且查询现在运行 >~2 小时

UPD:当我们删除所有 ISNULL 检查并仅使用 t1.vchCol1 = t2.vchCol1 时,查询完成40 分钟后。

问题是:这是已知行为吗?我们应该避免在任何地方使用IsNull吗?

We met a strange situation on SQL Server 2008 (SP1) - 10.0.2531.0 (X64) - Win2008 SP2 (X64).

Here is a one heavy query:

select t1.id, t2.id 
from t1, t2
where 
     t1.id = t2.ext_id
     and isnull(t1.vchCol1, 'Null') = isnull(t2.vchCol1, 'Null')
     and isnull(t1.vchCol2, 'Null') = isnull(t2.vchCol2, 'Null')
     .... and about 10 more comparisons with Isnull

UPD: All columns in comparison (except IDs) are varchar(~30...200)
T1 is ~130mln rows, T2 is ~300k rows.

These query on rather big Dev server run ~5 hours - this is slow, but what we can do?

And while we investigated possible ways of optimisation - we found, that changing "isnull" to "coalesce" in query above gives double performance gain - and query now run for ~2 hours

UPD: When we remove all ISNULL checks and use just t1.vchCol1 = t2.vchCol1 the query finishes after 40mins.

Question is: Is this known behavior and we should avoid using IsNull everywhere?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

萌酱 2024-11-17 02:39:31

我想知道通过明确地拆分案例您是否会看到改进:

...
AND ((t1.vchCol1 = t2.vchCol1) OR (t1.vchCol1 IS NULL AND t2.vchCol1 IS NULL))
AND ((t1.vchCol2 = t2.vchCol2) OR (t1.vchCol2 IS NULL AND t2.vchCol2 IS NULL))
...

I wonder if you'd see an improvement by splitting the cases out explicitly:

...
AND ((t1.vchCol1 = t2.vchCol1) OR (t1.vchCol1 IS NULL AND t2.vchCol1 IS NULL))
AND ((t1.vchCol2 = t2.vchCol2) OR (t1.vchCol2 IS NULL AND t2.vchCol2 IS NULL))
...
隱形的亼 2024-11-17 02:39:31

您会发现的有关该主题的大多数文章似乎都与此相矛盾。 ISNULLCOALESCE (稍微)快。

之间的差异ISNULLCOALESCE

COALESCE 基本上翻译为 CASE
表达式和 ISNULL 是内置的
在数据库引擎中实现。
...
这将产生一个性能
COALESCE 的差异和查询
这里的情况往往更糟。

ISNULL 与. 合并

我在一个
几个不同的服务器,以及 ISNULL
看起来相当一致
平均优于 COALESCE
10% 或 12%。但那就是
6秒和5.3秒的区别
秒(近似平均值
我的服务器上每次测试的运行时间),超过
一百万次例外的过程。
几乎不值得的功能和
标准合规性牺牲,在
至少在我使用这些的场景中
函数为。

COALESCEISNULL代码> vs <代码>IS NULL OR

表现最好的是IS NULL OR情况,
而这三个之间的区别
它们是次要的。

Most of the articles you'll find on this subject seem to contradict this. ISNULL is (marginally) faster than COALESCE.

Differences between ISNULL and COALESCE

COALESCE basically translates to CASE
expression and ISNULL is a built-in
implemented in the database engine.
...
This will make a performance
difference and queries with COALESCE
often fare worse here.

ISNULL vs. COALESCE

I ran these tests several times on a
few different servers, and ISNULL
appears to pretty consistently
out-perform COALESCE by an average of
10 or 12 percent. But that's the
difference between 6 seconds and 5.3
seconds (the approximate average
runtimes per test on my servers), over
the course of a million exections.
Hardly worth the functionality and
standards compliance sacrifice, at
least in the scenarios I use these
functions for.

COALESCE vs ISNULL vs IS NULL OR

the best performer is IS NULL OR case,
while the difference between all 3 of
them is minor.

自找没趣 2024-11-17 02:39:31

您可能需要考虑向每个保存校验和值的表添加一个计算列。然后,在ID列和校验和值上创建索引,最后在连接中使用校验和值。像这样的事情:

Alter Table T1 Add CheckSumId As CHECKSUM(vchCol1, vchCol2, vchCol3)
Alter Table T2 Add CheckSumId As CHECKSUM(vchCol1, vchCol2, vchCol3)

Create NonClustered index idx_T1_Checksum On T1(id, CheckSumId)
Create NonClustered index idx_T2_Checksum On T2(ext_id, CheckSumId)

那么你的查询将变成...

select t1.id, t2.id 
from t1 Inner Join t2
       On t1.id = t2.ext_id
       And T1.CheckSumId = T2.CheckSumId
where  isnull(t1.vchCol1, 'Null') = isnull(t2.vchCol1, 'Null')
     and isnull(t1.vchCol2, 'Null') = isnull(t2.vchCol2, 'Null')

这当然会使用额外的索引空间,但它只是 2 个整数,应该非常有效。每次插入、更新和删除也会有性能损失,因为需要维护另一个索引。但是,我怀疑这会对性能产生很大影响。

You may want to consider adding a computed column to each table that holds a checksum value. Then, create an index on the ID column and the checksum value, and finally use the checksum value in the join. Something like this:

Alter Table T1 Add CheckSumId As CHECKSUM(vchCol1, vchCol2, vchCol3)
Alter Table T2 Add CheckSumId As CHECKSUM(vchCol1, vchCol2, vchCol3)

Create NonClustered index idx_T1_Checksum On T1(id, CheckSumId)
Create NonClustered index idx_T2_Checksum On T2(ext_id, CheckSumId)

Then your query would become...

select t1.id, t2.id 
from t1 Inner Join t2
       On t1.id = t2.ext_id
       And T1.CheckSumId = T2.CheckSumId
where  isnull(t1.vchCol1, 'Null') = isnull(t2.vchCol1, 'Null')
     and isnull(t1.vchCol2, 'Null') = isnull(t2.vchCol2, 'Null')

This will, of course, use extra index space, but it's simply 2 integers which should be very efficient. There will also be a performance penalty for each insert, update and delete because another index needs to be maintained. However, I suspect this will have a large impact on performance.

不念旧人 2024-11-17 02:39:31

我意识到这是一年后的事了,但是......

对于这种逐列比较,您可能会考虑使用 EXCEPT。另外,EXCEPT 将 NULL 视为另一个值,而不是“它可以是任何东西!”,我喜欢这样称呼它。

“当您比较行以确定不同值时,两个 NULL 值被视为相等。”
--来自 http://msdn.microsoft.com/en-us/library/ ms188055.aspx

I realize this is a year later, but...

For this kind of column-by-column comparison, you might consider using EXCEPT. Also, EXCEPT treats NULL like another value instead of "It could be anything!", as I like to call it.

"When you compare rows for determining distinct values, two NULL values are considered equal."
--from http://msdn.microsoft.com/en-us/library/ms188055.aspx

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文