SQL Server 连接与子查询性能问题
我发现,在某些情况下,在 SS2008R2 中完成类似查询的
select
usertable.userid,
(select top 1 name from nametable where userid = usertable.userid) as name
from usertable
where active = 1
的等效连接查询要长一个数量级
select
usertable.userid,
nametable.name
from usertable
left join nametable on nametable.userid = usertable.userid
where usertable.active = 1
时间比两个表都已建立索引且行数超过 100k 。有趣的是,在原始查询中插入一个 top 子句使其性能与连接查询相当:
select
top (select count(*) from usertable where active = 1) usertable.userid,
(select top 1 name from nametable where userid = usertable.userid) as name
from usertable
where active = 1
有谁知道为什么原始查询性能如此差?
I discovered that in some cases a query like
select
usertable.userid,
(select top 1 name from nametable where userid = usertable.userid) as name
from usertable
where active = 1
takes an order of magnitude longer to complete in SS2008R2 than the equivalent join query
select
usertable.userid,
nametable.name
from usertable
left join nametable on nametable.userid = usertable.userid
where usertable.active = 1
where both tables are indexed and have over 100k rows. Interestingly, inserting a top clause into the original query makes it perform on par with the join query:
select
top (select count(*) from usertable where active = 1) usertable.userid,
(select top 1 name from nametable where userid = usertable.userid) as name
from usertable
where active = 1
Does anyone have any idea why the original query performs so poorly?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
好吧,查询是不同的 - 除非 userid 列是主键或具有唯一性约束,否则第二个查询可能返回比第一个查询更多的行。
也就是说,假设 userid 是主键/唯一,请尝试删除第一个子查询的 TOP 1 部分:
Well, the queries are different - unless the
userid
column is a primary key or has a uniqueness constraint then the second query could return more rows than the first.That said, with the assumption that userid is a primary key / unique try removing the
TOP 1
part of the first subquery:它是一个相关子查询,这意味着它需要为外部查询的每个返回行执行一次,因为它引用外部查询中的字段。
JOIN
对整个结果集运行一次并进行合并。您的子查询运行外部查询,然后对于每个返回的行再次运行子查询。It's a correlated subquery, which means it needs to execute once per return row of the outer query since it references a field in the outer query.
A
JOIN
runs once for the entire result set and gets merged. Your subquery runs the outer query, then for each returned row it runs the subquery again.原始查询将执行子
select
与行数一样多的次数,因此性能较差。当您
JOIN
时,您会立即获得整个结果集。The original query will execute the sub
select
as many times as there are rows, thus the poor performance.When you
JOIN
you get the whole result set at once.