为什么连接子句的顺序会影响 SQL Server 中的查询计划?

发布于 2024-07-28 00:06:44 字数 3263 浏览 6 评论 0原文

我正在 SQL Server 2000(和 2005)中构建一个视图,我注意到连接语句的顺序极大地影响了查询的执行计划和速度。

select      sr.WTSASessionRangeID,
            -- bunch of other columns
from        WTSAVW_UserSessionRange us
inner join  WTSA_SessionRange sr on sr.WTSASessionRangeID = us.WTSASessionRangeID
left outer join WTSA_SessionRangeTutor srt on srt.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeClass src on src.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeStream srs on srs.WTSASessionRangeID = sr.WTSASessionRangeID
--left outer join MO_Stream ms on ms.MOStreamID = srs.MOStreamID
left outer join WTSA_SessionRangeEnrolmentPeriod srep on srep.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeStudent stsd on stsd.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionSubrange ssr on ssr.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
left outer join MO_Stream ms on ms.MOStreamID = srs.MOStreamID

在 SQL Server 2000 上,上面的查询始终生成成本为 946 的计划。如果我取消注释查询中间的 MO_Stream 连接并注释掉底部的连接,成本就会下降到 263。执行速度也会相应下降。 我一直认为查询优化器会在不考虑连接顺序的情况下适当地解释查询,但顺序似乎很重要。

因此,既然顺序似乎很重要,那么我是否应该遵循一种连接策略来编写更快的查询?

(顺便说一下,在 SQL Server 2005 上,数据几乎相同,查询计划成本分别为 0.675 和 0.631。)

编辑:在 SQL Server 2000 上,以下是分析统计信息:

  • 946-cost查询:9094ms CPU,5121 次读取,0 次写入,10123ms 持续时间
  • 263 成本查询:172ms CPU,7477 次读取,0 次写入,170ms 持续时间

编辑: 这里是表的逻辑结构。

SessionRange ---+--- SessionRangeTutor
                |--- SessionRangeClass
                |--- SessionRangeStream --- MO_Stream
                |--- SessionRangeEnrolmentPeriod
                |--- SessionRangeStudent
                +----SessionSubrange --- SessionSubrangeRoom

编辑:感谢 Alex 和 gbn 为我指明了正确的方向。 我还发现了这个问题

这是新查询:

select sr.WTSASessionRangeID    // + lots of columns

from WTSAVW_UserSessionRange us
inner join WTSA_SessionRange sr on sr.WTSASessionRangeID = us.WTSASessionRangeID
left outer join WTSA_SessionRangeTutor srt on srt.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeClass src on src.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeEnrolmentPeriod srep on srep.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeStudent stsd on stsd.WTSASessionRangeID = sr.WTSASessionRangeID

// SessionRangeStream is a many-to-many mapping table between SessionRange and MO_Stream
left outer join (
    WTSA_SessionRangeStream srs
    inner join MO_Stream ms on ms.MOStreamID = srs.MOStreamID
) on srs.WTSASessionRangeID = sr.WTSASessionRangeID

// SessionRanges MAY have Subranges and Subranges MAY have Rooms
left outer join (
    WTSA_SessionSubrange ssr    
    left outer join WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
) on ssr.WTSASessionRangeID = sr.WTSASessionRangeID

SQLServer2000 cost: 24.9

I am building a view in SQL Server 2000 (and 2005) and I've noticed that the order of the join statements greatly affects the execution plan and speed of the query.

select      sr.WTSASessionRangeID,
            -- bunch of other columns
from        WTSAVW_UserSessionRange us
inner join  WTSA_SessionRange sr on sr.WTSASessionRangeID = us.WTSASessionRangeID
left outer join WTSA_SessionRangeTutor srt on srt.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeClass src on src.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeStream srs on srs.WTSASessionRangeID = sr.WTSASessionRangeID
--left outer join MO_Stream ms on ms.MOStreamID = srs.MOStreamID
left outer join WTSA_SessionRangeEnrolmentPeriod srep on srep.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeStudent stsd on stsd.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionSubrange ssr on ssr.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
left outer join MO_Stream ms on ms.MOStreamID = srs.MOStreamID

On SQL Server 2000, the query above consistently generates a plan of cost 946. If I uncomment the MO_Stream join in the middle of the query and comment out the one at the bottom, the cost drops to 263. The execution speed drops accordingly. I always thought that the query optimizer would interpret the query appropriately without considering join order, but it seems that order matters.

So since order does seem to matter, is there a join strategy I should be following for writing faster queries?

(Incidentally, on SQL Server 2005, with almost identical data, the query plan costs were 0.675 and 0.631 respectively.)

Edit: On SQL Server 2000, here are the profiled stats:

  • 946-cost query: 9094ms CPU, 5121 reads, 0 writes, 10123ms duration
  • 263-cost query: 172ms CPU, 7477 reads, 0 writes, 170ms duration

Edit: Here is the logical structure of the tables.

SessionRange ---+--- SessionRangeTutor
                |--- SessionRangeClass
                |--- SessionRangeStream --- MO_Stream
                |--- SessionRangeEnrolmentPeriod
                |--- SessionRangeStudent
                +----SessionSubrange --- SessionSubrangeRoom

Edit: Thanks to Alex and gbn for pointing me in the right direction. I also found this question.

Here's the new query:

select sr.WTSASessionRangeID    // + lots of columns

from WTSAVW_UserSessionRange us
inner join WTSA_SessionRange sr on sr.WTSASessionRangeID = us.WTSASessionRangeID
left outer join WTSA_SessionRangeTutor srt on srt.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeClass src on src.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeEnrolmentPeriod srep on srep.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeStudent stsd on stsd.WTSASessionRangeID = sr.WTSASessionRangeID

// SessionRangeStream is a many-to-many mapping table between SessionRange and MO_Stream
left outer join (
    WTSA_SessionRangeStream srs
    inner join MO_Stream ms on ms.MOStreamID = srs.MOStreamID
) on srs.WTSASessionRangeID = sr.WTSASessionRangeID

// SessionRanges MAY have Subranges and Subranges MAY have Rooms
left outer join (
    WTSA_SessionSubrange ssr    
    left outer join WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
) on ssr.WTSASessionRangeID = sr.WTSASessionRangeID

SQLServer2000 cost: 24.9

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

独﹏钓一江月 2024-08-04 00:06:44

我必须不同意以前的所有答案,原因很简单:如果您更改左连接的顺序,您的查询在逻辑上会有所不同,因此它们会产生不同的结果集。 你自己看:

SELECT 1 AS a INTO #t1
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4;

SELECT 1 AS b INTO #t2
UNION ALL SELECT 2;

SELECT 1 AS c INTO #t3
UNION ALL SELECT 3;

SELECT a, b, c 
FROM #t1 LEFT JOIN #t2 ON #t1.a=#t2.b
  LEFT JOIN #t3 ON #t2.b=#t3.c
ORDER BY a;

SELECT a, b, c 
FROM #t1 LEFT JOIN #t3 ON #t1.a=#t3.c
  LEFT JOIN #t2 ON #t3.c=#t2.b
ORDER BY a;

a           b           c
----------- ----------- -----------
1           1           1
2           2           NULL
3           NULL        NULL
4           NULL        NULL

(4 row(s) affected)

a           b           c
----------- ----------- -----------
1           1           1
2           NULL        NULL
3           NULL        3
4           NULL        NULL

I have to disagree with all previous answers, and the reason is simple: if you change the order of your left join, your queries are logically different and as such they produce different result sets. See for yourself:

SELECT 1 AS a INTO #t1
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4;

SELECT 1 AS b INTO #t2
UNION ALL SELECT 2;

SELECT 1 AS c INTO #t3
UNION ALL SELECT 3;

SELECT a, b, c 
FROM #t1 LEFT JOIN #t2 ON #t1.a=#t2.b
  LEFT JOIN #t3 ON #t2.b=#t3.c
ORDER BY a;

SELECT a, b, c 
FROM #t1 LEFT JOIN #t3 ON #t1.a=#t3.c
  LEFT JOIN #t2 ON #t3.c=#t2.b
ORDER BY a;

a           b           c
----------- ----------- -----------
1           1           1
2           2           NULL
3           NULL        NULL
4           NULL        NULL

(4 row(s) affected)

a           b           c
----------- ----------- -----------
1           1           1
2           NULL        NULL
3           NULL        3
4           NULL        NULL
童话里做英雄 2024-08-04 00:06:44

连接顺序确实会对结果查询产生影响。 FROM 的文档中以 BOL 记录了这一点:

<加入表>

是两个或多个表的乘积的结果集。 对于多个连接,使用括号更改连接的自然顺序

您可以使用连接周围的括号来更改连接顺序(BOL 确实在文档顶部的语法中显示了这一点,但很容易错过)。

这被称为奇斯行为。 您还可以使用查询提示OPTION (FORCE ORDER)来强制执行特定的连接顺序,但这可能会导致所谓的“密集计划”,而可能不是最适合正在执行的查询。

The join order does make a difference to the resulting query. This is documented in BOL in the docs for FROM:

<joined_table>

Is a result set that is the product of two or more tables. For multiple joins, use parentheses to change the natural order of the joins.

You can alter the join order using parenthesis around the joins (BOL does show this in the syntax at the top of the docs, but it is easy to miss).

This is known as chiastic behaviour. You can also use the query hint OPTION (FORCE ORDER) to force a specific join order, but this can result in what are called "bushy plans" which may not be the most optimal for the query being executed.

初与友歌 2024-08-04 00:06:44

显然,SQL Server 2005 优化器比 SQL Server 2000 优化器要好很多。

然而,你的问题有很多道理。 外连接将导致执行因顺序而发生很大变化(内连接往往会优化为最有效的路线,但同样,顺序很重要)。 如果您考虑一下,当您构建左连接时,您需要弄清楚左侧到底是什么。 因此,必须先计算每个连接,然后才能完成其他每个连接。 它变得顺序,而不是并行。 现在,显然,您可以采取一些措施来解决这个问题(例如索引、视图等)。 但是,要点是:表需要知道左侧有什么,​​然后才能进行左外连接。 如果您不断添加联接,您就会对左侧的内容越来越抽象(特别是如果您使用联接表作为左表!)。

然而,通过内部联接,您可以将它们相当程度地并行化,因此就顺序而言,显着的差异较小。

Obviously, the SQL Server 2005 optimizer is a lot better than the SQL Server 2000 one.

However, there's a lot of truth in your question. Outer joins will cause execution to vary wildly based on order (inner joins tend to be optimized to the most efficient route, but again, order matters). If you think about it, as you build up left joins, you need to figure out what the heck is on the left. As such, each join must be calculated before every other join can be done. It becomes sequential, and not parallel. Now, obviously, there are things you can do to combat this (such as indexes, views, etc). But, the point stands: The table needs to know what's on the left before it can do a left outer join. And if you just keep adding joins, you're getting more and more abstraction to what, exactly is on the left (especially if you use joined tables as the left table!).

With inner joins, however, you can parallelize those quite a bit, so there's less of a dramatic difference as far as order's concerned.

心奴独伤 2024-08-04 00:06:44

优化包含 JOIN 的查询的一般策略是查看数据模型和数据,并尝试确定哪些 JOIN 将最快地减少必须考虑的记录数量。 必须考虑的记录越少,查询运行的速度就越快。 服务器通常也会产生更好的查询计划。

除了上述优化之外,还要确保 JOIN 中使用的任何字段都已建立索引

A general strategy for optimizing queries containing JOINs is to look at your data model and the data and try to determine which JOINs will reduce number of records that must be considered the most quickly. The fewer records that must be considered, the faster the query will run. The server will generally produce a better query plan too.

Along with the above optimization make sure that any fields used in JOINs are indexed

层林尽染 2024-08-04 00:06:44

无论如何,您的查询可能是错误的。 亚历克斯是对的。 埃里克也可能是正确的,但查询是错误的。

让我们采用这个子集:

WTSA_SessionRange sr
left outer join
WTSA_SessionSubrange ssr on ssr.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join
WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID

您正在将 WTSA_SessionSubrangeRoom 加入到 WTSA_SessionSubrange。 您可能没有来自 WTSA_SessionSubrange 的行。

连接应该是这样的:

WTSA_SessionRange sr
left outer join
(SELECT WTSASessionRangeID, columns I need
FROM
    WTSA_SessionSubrange ssr
    left outer join
    WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
) foo on foo.WTSASessionRangeID = sr.WTSASessionRangeID

这就是连接顺序影响结果的原因,因为从声明的角度来说,它是一个不同的查询

您还需要更改 MO_StreamWTSA_SessionRangeStream 连接。

You query is probably wrong anyway. Alex is correct. Eric may be correct too, but the query is wrong.

Lets' take this subset:

WTSA_SessionRange sr
left outer join
WTSA_SessionSubrange ssr on ssr.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join
WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID

You are joining WTSA_SessionSubrangeRoom onto WTSA_SessionSubrange. You may have no rows from WTSA_SessionSubrange.

The join should be this:

WTSA_SessionRange sr
left outer join
(SELECT WTSASessionRangeID, columns I need
FROM
    WTSA_SessionSubrange ssr
    left outer join
    WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
) foo on foo.WTSASessionRangeID = sr.WTSASessionRangeID

This is why the join order is affecting results because it's a different query, declaratively speaking.

You'd also need to change the MO_Stream and WTSA_SessionRangeStream join too.

伊面 2024-08-04 00:06:44

这取决于哪个连接字段被索引 - 如果它必须表扫描第一个字段,但在第二个字段上使用索引,那么速度会很慢。 如果您的第一个连接字段是索引,那么速度会更快。 我的猜测是 2005 通过确定索引字段并首先执行这些字段来更好地优化它

it depends on which of the join fields are indexed - if it has to table scan the first field, but use an index on the second, it's slow. If your first join field is an index, it'll be quicker. My guess is that 2005 optimizes it better by determining the indexed fields and performing those first

相权↑美人 2024-08-04 00:06:44

几年前,在 DevConnections 上,关于 SQL Server 性能的会议指出,(a) 外连接的顺序确实很重要,(b) 当查询有很多连接时,在做出决定之前不会查看所有连接。一个计划。 如果您知道有有助于加快查询速度的联接,那么它们应该位于 FROM 列表的早期(如果可以的话)。

At DevConnections a few years ago a session on SQL Server performance stated that (a) order of outer joins DOES matter, and (b) when a query has a lot of joins, it will not look at all of them before making a determination on a plan. If you know you have joins that will help speed up a query, they should be early on in the FROM list (if you can).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文