并行性如何影响结果数量?
我有一个相当复杂的查询,如下所示:
create table Items(SomeOtherTableID int,SomeField int)
create table SomeOtherTable(Id int,GroupID int)
with cte1 as
(
select
SomeOtherTableID,COUNT(*) SubItemCount
from
Items t
where
t.SomeField is not null
group by
SomeOtherTableID
),cte2 as
(
select
tc.SomeOtherTableID,ROW_NUMBER() over (partition by a.GroupID order by tc.SubItemCount desc) SubItemRank
from
Items t
inner join SomeOtherTable a on a.Id=t.SomeOtherTableID
inner join cte1 tc on tc.SomeOtherTableID=t.SomeOtherTableID
where
t.SomeField is not null
),cte3 as
(
select
SomeOtherTableID
from
cte2
where
SubItemRank=1
)
select
*
from
cte3 t1
inner join cte3 t2 on t1.SomeOtherTableID<t2.SomeOtherTableID
option (maxdop 1)
该查询的 cte3 填充有 6222 不同 结果。 在最后的选择中,我对cte3与其自身执行交叉连接(这样我就可以将表中的每个值与表中的每个其他值进行比较)稍后一点)。 注意最后一行:
option (maxdop 1)
显然,这会关闭并行性。
因此,对于 cte3 中的 6222 结果行,我预计 (6222*6221)/2 或 19353531 结果会出现在后续的交叉连接选择中,并且随着最终的 maxdop 线就位,情况确实如此。
但是,当我删除 maxdop 行时,结果数会跳至 19380454。 我的开发盒上有 4 个核心。
搞什么? 谁能解释这是为什么吗? 我是否需要重新考虑以前以这种方式交叉连接的查询?
I have a fairly complex query that looks something like this:
create table Items(SomeOtherTableID int,SomeField int)
create table SomeOtherTable(Id int,GroupID int)
with cte1 as
(
select
SomeOtherTableID,COUNT(*) SubItemCount
from
Items t
where
t.SomeField is not null
group by
SomeOtherTableID
),cte2 as
(
select
tc.SomeOtherTableID,ROW_NUMBER() over (partition by a.GroupID order by tc.SubItemCount desc) SubItemRank
from
Items t
inner join SomeOtherTable a on a.Id=t.SomeOtherTableID
inner join cte1 tc on tc.SomeOtherTableID=t.SomeOtherTableID
where
t.SomeField is not null
),cte3 as
(
select
SomeOtherTableID
from
cte2
where
SubItemRank=1
)
select
*
from
cte3 t1
inner join cte3 t2 on t1.SomeOtherTableID<t2.SomeOtherTableID
option (maxdop 1)
The query is such that cte3 is filled with 6222 distinct results. In the final select, I am performing a cross join on cte3 with itself, (so that I can compare every value in the table with every other value in the table at a later point). Notice the final line :
option (maxdop 1)
Apparently, this switches off parallelism.
So, with 6222 results rows in cte3, I would expect (6222*6221)/2, or 19353531 results in the subsequent cross joining select, and with the final maxdop line in place, that is indeed the case.
However, when I remove the maxdop line, the number of results jumps to 19380454. I have 4 cores on my dev box.
WTF? Can anyone explain why this is? Do I need to reconsider previous queries that cross join in this way?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
除了错误之外,并行性不应该影响结果。
Aside from a bug, parallelism should not affect the result.
看起来 SCOPE_IDENTITY 也存在类似的并行错误
或者您是否使用快照隔离,这也是另一个错误? 还有一些博客演示了在某些情况下暂时关闭快照隔离的情况。
编辑:
回到快照隔离,这里的第 3 点: 您应该对并行性感到紧张的六个原因
Looks like there is a similar parallism bug with SCOPE_IDENTITY too
Or are you using snaphot isolation, another bug too? There are also some blogs that demonstrate snapshot isolation being turned off temporarily under certain conditions.
Edit:
Going back to Snapshot isolation, number 3 here: Six reasons you should be nervous about parallelism