为什么连接子句的顺序会影响 SQL Server 中的查询计划?
我正在 SQL Server 2000(和 2005)中构建一个视图,我注意到连接语句的顺序极大地影响了查询的执行计划和速度。
select sr.WTSASessionRangeID,
-- bunch of other columns
from WTSAVW_UserSessionRange us
inner join WTSA_SessionRange sr on sr.WTSASessionRangeID = us.WTSASessionRangeID
left outer join WTSA_SessionRangeTutor srt on srt.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeClass src on src.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeStream srs on srs.WTSASessionRangeID = sr.WTSASessionRangeID
--left outer join MO_Stream ms on ms.MOStreamID = srs.MOStreamID
left outer join WTSA_SessionRangeEnrolmentPeriod srep on srep.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeStudent stsd on stsd.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionSubrange ssr on ssr.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
left outer join MO_Stream ms on ms.MOStreamID = srs.MOStreamID
在 SQL Server 2000 上,上面的查询始终生成成本为 946 的计划。如果我取消注释查询中间的 MO_Stream 连接并注释掉底部的连接,成本就会下降到 263。执行速度也会相应下降。 我一直认为查询优化器会在不考虑连接顺序的情况下适当地解释查询,但顺序似乎很重要。
因此,既然顺序似乎很重要,那么我是否应该遵循一种连接策略来编写更快的查询?
(顺便说一下,在 SQL Server 2005 上,数据几乎相同,查询计划成本分别为 0.675 和 0.631。)
编辑:在 SQL Server 2000 上,以下是分析统计信息:
946-cost查询:9094ms CPU,5121 次读取,0 次写入,10123ms 持续时间
263 成本查询:172ms CPU,7477 次读取,0 次写入,170ms 持续时间
编辑: 这里是表的逻辑结构。
SessionRange ---+--- SessionRangeTutor
|--- SessionRangeClass
|--- SessionRangeStream --- MO_Stream
|--- SessionRangeEnrolmentPeriod
|--- SessionRangeStudent
+----SessionSubrange --- SessionSubrangeRoom
编辑:感谢 Alex 和 gbn 为我指明了正确的方向。 我还发现了这个问题。
这是新查询:
select sr.WTSASessionRangeID // + lots of columns
from WTSAVW_UserSessionRange us
inner join WTSA_SessionRange sr on sr.WTSASessionRangeID = us.WTSASessionRangeID
left outer join WTSA_SessionRangeTutor srt on srt.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeClass src on src.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeEnrolmentPeriod srep on srep.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeStudent stsd on stsd.WTSASessionRangeID = sr.WTSASessionRangeID
// SessionRangeStream is a many-to-many mapping table between SessionRange and MO_Stream
left outer join (
WTSA_SessionRangeStream srs
inner join MO_Stream ms on ms.MOStreamID = srs.MOStreamID
) on srs.WTSASessionRangeID = sr.WTSASessionRangeID
// SessionRanges MAY have Subranges and Subranges MAY have Rooms
left outer join (
WTSA_SessionSubrange ssr
left outer join WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
) on ssr.WTSASessionRangeID = sr.WTSASessionRangeID
SQLServer2000 cost: 24.9
I am building a view in SQL Server 2000 (and 2005) and I've noticed that the order of the join statements greatly affects the execution plan and speed of the query.
select sr.WTSASessionRangeID,
-- bunch of other columns
from WTSAVW_UserSessionRange us
inner join WTSA_SessionRange sr on sr.WTSASessionRangeID = us.WTSASessionRangeID
left outer join WTSA_SessionRangeTutor srt on srt.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeClass src on src.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeStream srs on srs.WTSASessionRangeID = sr.WTSASessionRangeID
--left outer join MO_Stream ms on ms.MOStreamID = srs.MOStreamID
left outer join WTSA_SessionRangeEnrolmentPeriod srep on srep.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeStudent stsd on stsd.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionSubrange ssr on ssr.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
left outer join MO_Stream ms on ms.MOStreamID = srs.MOStreamID
On SQL Server 2000, the query above consistently generates a plan of cost 946. If I uncomment the MO_Stream join in the middle of the query and comment out the one at the bottom, the cost drops to 263. The execution speed drops accordingly. I always thought that the query optimizer would interpret the query appropriately without considering join order, but it seems that order matters.
So since order does seem to matter, is there a join strategy I should be following for writing faster queries?
(Incidentally, on SQL Server 2005, with almost identical data, the query plan costs were 0.675 and 0.631 respectively.)
Edit: On SQL Server 2000, here are the profiled stats:
946-cost query: 9094ms CPU, 5121 reads, 0 writes, 10123ms duration
263-cost query: 172ms CPU, 7477 reads, 0 writes, 170ms duration
Edit: Here is the logical structure of the tables.
SessionRange ---+--- SessionRangeTutor
|--- SessionRangeClass
|--- SessionRangeStream --- MO_Stream
|--- SessionRangeEnrolmentPeriod
|--- SessionRangeStudent
+----SessionSubrange --- SessionSubrangeRoom
Edit: Thanks to Alex and gbn for pointing me in the right direction. I also found this question.
Here's the new query:
select sr.WTSASessionRangeID // + lots of columns
from WTSAVW_UserSessionRange us
inner join WTSA_SessionRange sr on sr.WTSASessionRangeID = us.WTSASessionRangeID
left outer join WTSA_SessionRangeTutor srt on srt.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeClass src on src.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeEnrolmentPeriod srep on srep.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeStudent stsd on stsd.WTSASessionRangeID = sr.WTSASessionRangeID
// SessionRangeStream is a many-to-many mapping table between SessionRange and MO_Stream
left outer join (
WTSA_SessionRangeStream srs
inner join MO_Stream ms on ms.MOStreamID = srs.MOStreamID
) on srs.WTSASessionRangeID = sr.WTSASessionRangeID
// SessionRanges MAY have Subranges and Subranges MAY have Rooms
left outer join (
WTSA_SessionSubrange ssr
left outer join WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
) on ssr.WTSASessionRangeID = sr.WTSASessionRangeID
SQLServer2000 cost: 24.9
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
我必须不同意以前的所有答案,原因很简单:如果您更改左连接的顺序,您的查询在逻辑上会有所不同,因此它们会产生不同的结果集。 你自己看:
I have to disagree with all previous answers, and the reason is simple: if you change the order of your left join, your queries are logically different and as such they produce different result sets. See for yourself:
连接顺序确实会对结果查询产生影响。 FROM 的文档中以 BOL 记录了这一点:
您可以使用连接周围的括号来更改连接顺序(BOL 确实在文档顶部的语法中显示了这一点,但很容易错过)。
这被称为奇斯行为。 您还可以使用查询提示
OPTION (FORCE ORDER)
来强制执行特定的连接顺序,但这可能会导致所谓的“密集计划”,而可能不是最适合正在执行的查询。The join order does make a difference to the resulting query. This is documented in BOL in the docs for FROM:
You can alter the join order using parenthesis around the joins (BOL does show this in the syntax at the top of the docs, but it is easy to miss).
This is known as chiastic behaviour. You can also use the query hint
OPTION (FORCE ORDER)
to force a specific join order, but this can result in what are called "bushy plans" which may not be the most optimal for the query being executed.显然,SQL Server 2005 优化器比 SQL Server 2000 优化器要好很多。
然而,你的问题有很多道理。 外连接将导致执行因顺序而发生很大变化(内连接往往会优化为最有效的路线,但同样,顺序很重要)。 如果您考虑一下,当您构建左连接时,您需要弄清楚左侧到底是什么。 因此,必须先计算每个连接,然后才能完成其他每个连接。 它变得顺序,而不是并行。 现在,显然,您可以采取一些措施来解决这个问题(例如索引、视图等)。 但是,要点是:表需要知道左侧有什么,然后才能进行左外连接。 如果您不断添加联接,您就会对左侧的内容越来越抽象(特别是如果您使用联接表作为左表!)。
然而,通过内部联接,您可以将它们相当程度地并行化,因此就顺序而言,显着的差异较小。
Obviously, the SQL Server 2005 optimizer is a lot better than the SQL Server 2000 one.
However, there's a lot of truth in your question. Outer joins will cause execution to vary wildly based on order (inner joins tend to be optimized to the most efficient route, but again, order matters). If you think about it, as you build up left joins, you need to figure out what the heck is on the left. As such, each join must be calculated before every other join can be done. It becomes sequential, and not parallel. Now, obviously, there are things you can do to combat this (such as indexes, views, etc). But, the point stands: The table needs to know what's on the left before it can do a left outer join. And if you just keep adding joins, you're getting more and more abstraction to what, exactly is on the left (especially if you use joined tables as the left table!).
With inner joins, however, you can parallelize those quite a bit, so there's less of a dramatic difference as far as order's concerned.
优化包含 JOIN 的查询的一般策略是查看数据模型和数据,并尝试确定哪些 JOIN 将最快地减少必须考虑的记录数量。 必须考虑的记录越少,查询运行的速度就越快。 服务器通常也会产生更好的查询计划。
除了上述优化之外,还要确保 JOIN 中使用的任何字段都已建立索引
A general strategy for optimizing queries containing JOINs is to look at your data model and the data and try to determine which JOINs will reduce number of records that must be considered the most quickly. The fewer records that must be considered, the faster the query will run. The server will generally produce a better query plan too.
Along with the above optimization make sure that any fields used in JOINs are indexed
无论如何,您的查询可能是错误的。 亚历克斯是对的。 埃里克也可能是正确的,但查询是错误的。
让我们采用这个子集:
您正在将 WTSA_SessionSubrangeRoom 加入到 WTSA_SessionSubrange。 您可能没有来自 WTSA_SessionSubrange 的行。
连接应该是这样的:
这就是连接顺序影响结果的原因,因为从声明的角度来说,它是一个不同的查询。
您还需要更改
MO_Stream
和WTSA_SessionRangeStream
连接。You query is probably wrong anyway. Alex is correct. Eric may be correct too, but the query is wrong.
Lets' take this subset:
You are joining WTSA_SessionSubrangeRoom onto WTSA_SessionSubrange. You may have no rows from WTSA_SessionSubrange.
The join should be this:
This is why the join order is affecting results because it's a different query, declaratively speaking.
You'd also need to change the
MO_Stream
andWTSA_SessionRangeStream
join too.这取决于哪个连接字段被索引 - 如果它必须表扫描第一个字段,但在第二个字段上使用索引,那么速度会很慢。 如果您的第一个连接字段是索引,那么速度会更快。 我的猜测是 2005 通过确定索引字段并首先执行这些字段来更好地优化它
it depends on which of the join fields are indexed - if it has to table scan the first field, but use an index on the second, it's slow. If your first join field is an index, it'll be quicker. My guess is that 2005 optimizes it better by determining the indexed fields and performing those first
几年前,在 DevConnections 上,关于 SQL Server 性能的会议指出,(a) 外连接的顺序确实很重要,(b) 当查询有很多连接时,在做出决定之前不会查看所有连接。一个计划。 如果您知道有有助于加快查询速度的联接,那么它们应该位于 FROM 列表的早期(如果可以的话)。
At DevConnections a few years ago a session on SQL Server performance stated that (a) order of outer joins DOES matter, and (b) when a query has a lot of joins, it will not look at all of them before making a determination on a plan. If you know you have joins that will help speed up a query, they should be early on in the FROM list (if you can).