为什么一个查询非常慢,但相似表上的相同查询却眨眼间就运行完毕

发布于 2024-11-29 00:07:13 字数 2483 浏览 1 评论 0原文

我有这个查询...运行速度非常慢(几乎一分钟):

select distinct main.PrimeId 
from PRIME main 
join   
( 
select distinct p.PrimeId   from PRIME p   
left  outer join ATTRGROUP a 
on p.PrimeId = a.PrimeId   or p.PrimeId = a.RelatedPrimeId    
where a.PrimeId is not null and a.RelatedPrimeId is not null  
) mem  
on main.PrimeId = mem.PrimeId

PRIME 表有 18k 行,并且在 PrimeId 上有 PK。

ATTRGROUP 表有 24k 行,并且在 PrimeId、col2、RelatedPrimeId、列 4-7 上有复合 PK。在RelatedPrimeId 上还有一个单独的索引。

该查询最终返回 8.5k 行 - PRIME 表上的 PrimeId 的不同值与 ATTRGROUP 表上的 PrimeId 或 RelatedPrimeId 相匹配

我有相同的查询,使用 ATTRADDRESS 而不是 ATTRGROUP。 ATTRADDRESS 具有与 ATRGROUP 相同的键和索引结构。它只有 11k 行,诚然,这个数字较小,但在这种情况下,查询运行大约一秒,并返回 11k 行。

所以我的问题是:

尽管结构相同,但一个表上的查询怎么会比另一个表慢这么多。

到目前为止,我已经在 SQL 2005 上尝试过这一点,并且(使用相同的数据库,升级)SQL 2008 R2。我们两个人独立获得了相同的结果,将相同的备份恢复到两台不同的计算机上。

其他细节:

  • 括号内的位运行时间不到一秒,即使在慢速查询中,
  • 执行计划中也有可能的线索,我不明白。这是其中的一部分,其中包含可疑的 320,000,000 行操作:

在此处输入图像描述 在此处输入图像描述

但是,该表上的实际行数略高于 24k,而不是 320M !

如果我重构括号内的查询部分,以便它使用 UNION 而不是 OR,因此:

select distinct main.PrimeId 
from PRIME main 
join   
( 
select distinct p.PrimeId   from PRIME p   
left  outer join ATTRGROUP a 
on p.PrimeId = a.PrimeId
where a.PrimeId is not null and a.RelatedPrimeId is not null  
UNION
select distinct p.PrimeId   from PRIME p   
left  outer join ATTRGROUP a 
on p.PrimeId = a.RelatedPrimeId    
where a.PrimeId is not null and a.RelatedPrimeId is not null  
) mem  
on main.PrimeId = mem.PrimeId

... 那么慢速查询需要不到一秒钟的时间。

我非常感谢对此的任何见解!如果您需要更多信息,请告诉我,我会更新问题。谢谢!

顺便说一句,我意识到在这个例子中存在冗余连接。这不能轻易删除,因为在生产中整个事情都是动态生成的,并且括号中的位有许多不同的形式。


编辑

我已经在 ATTRGROUP 上重建了索引,没有显着差异。

编辑2

如果我使用临时表,则:

select distinct p.PrimeId into #temp
from PRIME p   
left  outer join ATTRGROUP a 
on p.PrimeId = a.PrimeId   or p.PrimeId = a.RelatedPrimeId    
where a.PrimeId is not null and a.RelatedPrimeId is not null  

select distinct main.PrimeId 
from Prime main join   
#temp mem  
on main.PrimeId = mem.PrimeId

...话又说回来,即使在原始 OUTER JOIN 中使用 OR,它也会在不到一秒的时间内运行。我讨厌像这样的临时表,因为它总是感觉像是承认失败,所以这不是我将使用的重构,但我认为它产生这样的差异很有趣。

编辑 3

更新统计数据也没有什么区别。

感谢您迄今为止的所有建议。

I have this query ...which runs extremely slowly (almost a minute):

select distinct main.PrimeId 
from PRIME main 
join   
( 
select distinct p.PrimeId   from PRIME p   
left  outer join ATTRGROUP a 
on p.PrimeId = a.PrimeId   or p.PrimeId = a.RelatedPrimeId    
where a.PrimeId is not null and a.RelatedPrimeId is not null  
) mem  
on main.PrimeId = mem.PrimeId

The PRIME table has 18k rows, and has PK on PrimeId.

The ATTRGROUP table has 24k rows, and has a composite PK on PrimeId, col2, then RelatedPrimeId, and then cols 4-7. There's also a separate index on RelatedPrimeId.

The query eventually returns 8.5k rows - distinct values of PrimeId on the PRIME table that match either PrimeId or RelatedPrimeId on the ATTRGROUP table

I have the identical query, using ATTRADDRESS instead of ATTRGROUP. ATTRADDRESS has an identical key and index structure as ATTRGROUP. It has only 11k rows on it, which is smaller, admittedly, but in that case, the query runs in about a second, and returns 11k rows.

So my question is this:

How can the query be so much slower on one table than another, despite the structures being identical.

So far, I've tried this on SQL 2005, and (using the same database, upgraded) SQL 2008 R2. Two of us have independently obtained the same results, restoring the same backup to two different computers.

Other details:

  • the bit inside the brackets runs in less than a second, even in the slow query
  • there's a possible clue in the execution plan, which I don't understand. Here's part of it, with a suspicious 320,000,000 row operation:

enter image description here
enter image description here

However, the actual number of rows on that table is a little over 24k, not 320M !

If I refactor the part of the query inside the brackets, so that it uses a UNION rather than an OR, thus:

select distinct main.PrimeId 
from PRIME main 
join   
( 
select distinct p.PrimeId   from PRIME p   
left  outer join ATTRGROUP a 
on p.PrimeId = a.PrimeId
where a.PrimeId is not null and a.RelatedPrimeId is not null  
UNION
select distinct p.PrimeId   from PRIME p   
left  outer join ATTRGROUP a 
on p.PrimeId = a.RelatedPrimeId    
where a.PrimeId is not null and a.RelatedPrimeId is not null  
) mem  
on main.PrimeId = mem.PrimeId

... then the slow query takes under a second.

I'd greatly appreciate any insight on this! Let me know if you need any more info and I'll update the question. Thanks!

By the way, I realise that in this example there's a redundant join. This can't easily be removed, since in production the whole thing is generated dynamically, and the bit in the brackets takes many different forms.


Edit:

I've rebuilt the indexes on ATTRGROUP, makes no significant difference.

Edit 2:

If I use a temporary table, thus:

select distinct p.PrimeId into #temp
from PRIME p   
left  outer join ATTRGROUP a 
on p.PrimeId = a.PrimeId   or p.PrimeId = a.RelatedPrimeId    
where a.PrimeId is not null and a.RelatedPrimeId is not null  

select distinct main.PrimeId 
from Prime main join   
#temp mem  
on main.PrimeId = mem.PrimeId

... then again, even with an OR in the original OUTER JOIN, it runs in less than a second. I hate temp tables like this, since it always feels like an admission of defeat, so it isn't the refactor I'll be using, but I thought it was interesting that it makes such a difference.

Edit 3:

Updating the stats makes no difference either.

Thanks for all your suggestions so far.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

老子叫无熙 2024-12-06 00:07:13

根据我的经验,最好在 JOIN 子句中使用两个左连接而不是 OR。
所以

    left  outer join ATTRGROUP a 
    on p.PrimeId = a.PrimeId   or p.PrimeId = a.RelatedPrimeId

我建议:

    left  outer join ATTRGROUP a 
    on p.PrimeId = a.PrimeId
    left  outer join ATTRGROUP a2
    on p.PrimeId = a2.RelatedPrimeId    

In my experience its better to use two left joins rather than an OR in the JOIN clause.
So instead of:

    left  outer join ATTRGROUP a 
    on p.PrimeId = a.PrimeId   or p.PrimeId = a.RelatedPrimeId

I would suggest:

    left  outer join ATTRGROUP a 
    on p.PrimeId = a.PrimeId
    left  outer join ATTRGROUP a2
    on p.PrimeId = a2.RelatedPrimeId    
小猫一只 2024-12-06 00:07:13

我注意到主查询与子查询不相关:

select distinct main.PrimeId 
from PRIME main 
join   
( 
select distinct p.PrimeId   from PRIME p   
left  outer join ATTRGROUP a 
on p.PrimeId = a.PrimeId
where *main.PrimeId = a.PrimeId*  
UNION
select distinct p.PrimeId   from PRIME p   
left  outer join ATTRGROUP a 
on p.PrimeId = a.RelatedPrimeId    
where *main.PrimeId = a.PrimeId*  
) mem  
on main.PrimeId = mem.PrimeId

在这种构造中,您也不需要使用“is not null”子句(您是否需要它,因为主键永远不会保留空值) -价值?)。

我被教导要避免 OR 结构(正如其他人已经建议的那样),但也要避免“is not null”或“in valuelist”结构。这些大多可以用 (NOT) EXISTS 子句代替。

I notice that the main-query isn't correlated with the sub-query:

select distinct main.PrimeId 
from PRIME main 
join   
( 
select distinct p.PrimeId   from PRIME p   
left  outer join ATTRGROUP a 
on p.PrimeId = a.PrimeId
where *main.PrimeId = a.PrimeId*  
UNION
select distinct p.PrimeId   from PRIME p   
left  outer join ATTRGROUP a 
on p.PrimeId = a.RelatedPrimeId    
where *main.PrimeId = a.PrimeId*  
) mem  
on main.PrimeId = mem.PrimeId

In this construction you don't need to use the 'is not null' clause as well (will you ever need that since a primarykey will never hold a null-value?).

I was taught to avoid OR-constructions (as is already adviced by others) but also to avoid 'is not null' or 'in valuelist' - construction. Those can mostly be replaced by an (NOT) EXISTS-clause.

梦中的蝴蝶 2024-12-06 00:07:13

这不是直接答案,但如果您有从 ATTRGROUP.PrimeId 和 ATTRGROUP.RelatedPrimeId 引用到 main 的 FK 约束,那么您的查询相当于简单得多的查询:

select PrimeId   from ATTRGROUP a 
union
select RelatedPrimeId from ATTRGROUP a 

This is not a direct answer, but if you have FK constraints referring from ATTRGROUP.PrimeId and ATTRGROUP.RelatedPrimeId to main, then your query is equivalent to this much simpler one:

select PrimeId   from ATTRGROUP a 
union
select RelatedPrimeId from ATTRGROUP a 
冷夜 2024-12-06 00:07:13

一个表上的一个查询可能比另一个表上的查询慢得多的原因之一是该表上的统计信息已过时,并且它选择了错误的查询计划。

不过,我支持重构,摆脱其他人建议的 or 子句。

One reason why one query could be much slower on one table than the other is that statistics on that table are out of date and it is choosing the wrong query plan.

However I support the refactoring that gets rid of the or clause that others have suggested anyway.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文