投影内相关子查询排序的影响
我注意到 SQL Server(本例中为 SQL Server 2008)处理 select 语句中相关子查询的方式有点出乎意料。我的假设是,查询计划不应仅受 select 语句的投影子句中写入子查询(或列)的顺序影响。然而,情况似乎并非如此。
考虑以下两个查询,除了 CTE 中子查询的顺序之外,它们是相同的:
--query 1: subquery for Color is second
WITH vw AS
(
SELECT p.[ID],
(SELECT TOP(1) [FirstName] FROM [Preference] WHERE p.ID = ID AND [FirstName] IS NOT NULL ORDER BY [LastModified] DESC) [FirstName],
(SELECT TOP(1) [Color] FROM [Preference] WHERE p.ID = ID AND [Color] IS NOT NULL ORDER BY [LastModified] DESC) [Color]
FROM Person p
)
SELECT ID, Color, FirstName
FROM vw
WHERE Color = 'Gray';
--query 2: subquery for Color is first
WITH vw AS
(
SELECT p.[ID],
(SELECT TOP(1) [Color] FROM [Preference] WHERE p.ID = ID AND [Color] IS NOT NULL ORDER BY [LastModified] DESC) [Color],
(SELECT TOP(1) [FirstName] FROM [Preference] WHERE p.ID = ID AND [FirstName] IS NOT NULL ORDER BY [LastModified] DESC) [FirstName]
FROM Person p
)
SELECT ID, Color, FirstName
FROM vw
WHERE Color = 'Gray';
如果查看这两个查询计划,您将看到每个子查询都使用了外部联接,并且联接的顺序为与子查询的写入顺序相同。有一个过滤器应用于颜色外连接的结果,以过滤掉颜色不是“灰色”的行。 (对我来说奇怪的是,SQL 会对颜色子查询使用外连接,因为我对颜色子查询的结果有非空约束,但是没关系。)
大多数行都被颜色过滤器删除。结果是查询 2 比查询 1 便宜得多,因为第二个连接涉及的行数较少。抛开构建这样一个声明的所有原因,这是预期的行为吗? SQL Server 是否应该选择在查询计划中尽早移动过滤器,而不管子查询的写入顺序如何?
编辑:只是为了澄清一下,我探索这种情况是有充分理由的。我可能需要创建一个涉及类似构造的子查询的视图,现在很明显,基于从视图投影的这些列的任何过滤都会因为列的顺序而导致性能变化!
I'm noticing something a bit unexpected with how SQL Server (SQL Server 2008 in this case) treats correlated subqueries within a select statement. My assumption was that a query plan should not be affected by the mere order in which subqueries (or columns, for that matter) are written within the projection clause of the select statement. However, this does not appear to be the case.
Consider the following two queries, which are identical except for the ordering of the subqueries within the CTE:
--query 1: subquery for Color is second
WITH vw AS
(
SELECT p.[ID],
(SELECT TOP(1) [FirstName] FROM [Preference] WHERE p.ID = ID AND [FirstName] IS NOT NULL ORDER BY [LastModified] DESC) [FirstName],
(SELECT TOP(1) [Color] FROM [Preference] WHERE p.ID = ID AND [Color] IS NOT NULL ORDER BY [LastModified] DESC) [Color]
FROM Person p
)
SELECT ID, Color, FirstName
FROM vw
WHERE Color = 'Gray';
--query 2: subquery for Color is first
WITH vw AS
(
SELECT p.[ID],
(SELECT TOP(1) [Color] FROM [Preference] WHERE p.ID = ID AND [Color] IS NOT NULL ORDER BY [LastModified] DESC) [Color],
(SELECT TOP(1) [FirstName] FROM [Preference] WHERE p.ID = ID AND [FirstName] IS NOT NULL ORDER BY [LastModified] DESC) [FirstName]
FROM Person p
)
SELECT ID, Color, FirstName
FROM vw
WHERE Color = 'Gray';
If you look at the two query plans, you'll see that an outer join is used for each subquery and that the order of the joins is the same as the order the subqueries are written. There is a filter applied to the result of the outer join for color, to filter out rows where the color is not 'Gray'. (It's odd to me that SQL would use an outer join for the color subquery since I have a non-null constraint on the result of the color subquery, but OK.)
Most of the rows are removed by the color filter. The result is that query 2 is significantly cheaper than query 1 because fewer rows are involved with the second join. All reasons for constructing such a statement aside, is this an expected behavior? Shouldn't SQL server opt to move the filter as early as possible in the query plan, regardless of the order the subqueries are written?
Edit: Just to clarify, there is a valid reason I'm exploring this scenario. I may need to create a view that involves similarly constructed subqueries, and it is now apparent that any filtering based on these columns projected from the view will vary in performance just because of the ordering of the columns!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
随着 TOP 运算符在这里发挥作用,查询优化器对统计数据非常盲目,因此它将寻找有关如何最好地工作的其他线索,例如首先实例化 CTE 的相关部分。
它是一个外连接,因为如果没有返回任何内容,子查询将被用作 NULL,并且系统首先实例化它。如果您使用聚合而不是 TOP,您可能会得到一个略有不同但更一致的计划。
With the TOP operator coming into play here, the Query Optimizer is remarkably blind about the statistics, so it will look for other clues about how best to work it, such as instantiating relevant parts of the CTE first.
And it's an outer join because the subquery will be used as NULL if nothing is returned, and the system is instantiating it first. If you were using an aggregate instead of TOP, you'd probably get a slightly different but more consistent plan.
这是一个可能表现更好的替代版本:
另一个更简洁的解决方案,但可能会也可能不会表现得很好:
Here is an alternate version that might perform better:
Another solution which is more concise but may or may not perform as well: