投影内相关子查询排序的影响

发布于 2024-08-26 19:25:49 字数 1374 浏览 6 评论 0原文

我注意到 SQL Server(本例中为 SQL Server 2008)处理 select 语句中相关子查询的方式有点出乎意料。我的假设是,查询计划不应仅受 select 语句的投影子句中写入子查询(或列)的顺序影响。然而,情况似乎并非如此。

考虑以下两个查询,除了 CTE 中子查询的顺序之外,它们是相同的:

--query 1: subquery for Color is second
WITH vw AS
(
 SELECT p.[ID],
  (SELECT TOP(1) [FirstName] FROM [Preference] WHERE p.ID = ID AND [FirstName] IS NOT NULL ORDER BY [LastModified] DESC) [FirstName],
  (SELECT TOP(1) [Color] FROM [Preference] WHERE p.ID = ID AND [Color] IS NOT NULL ORDER BY [LastModified] DESC) [Color]
 FROM Person p
)
SELECT ID, Color, FirstName
FROM vw
WHERE Color = 'Gray';


--query 2: subquery for Color is first
WITH vw AS
(
 SELECT p.[ID],
  (SELECT TOP(1) [Color] FROM [Preference] WHERE p.ID = ID AND [Color] IS NOT NULL ORDER BY [LastModified] DESC) [Color],
  (SELECT TOP(1) [FirstName] FROM [Preference] WHERE p.ID = ID AND [FirstName] IS NOT NULL ORDER BY [LastModified] DESC) [FirstName]
 FROM Person p
)
SELECT ID, Color, FirstName
FROM vw
WHERE Color = 'Gray';

如果查看这两个查询计划,您将看到每个子查询都使用了外部联接,并且联接的顺序为与子查询的写入顺序相同。有一个过滤器应用于颜色外连接的结果,以过滤掉颜色不是“灰色”的行。 (对我来说奇怪的是,SQL 会对颜色子查询使用外连接,因为我对颜色子查询的结果有非空约束,但是没关系。)

大多数行都被颜色过滤器删除。结果是查询 2 比查询 1 便宜得多,因为第二个连接涉及的行数较少。抛开构建这样一个声明的所有原因,这是预期的行为吗? SQL Server 是否应该选择在查询计划中尽早移动过滤器,而不管子查询的写入顺序如何?

编辑:只是为了澄清一下,我探索这种情况是有充分理由的。我可能需要创建一个涉及类似构造的子查询的视图,现在很明显,基于从视图投影的这些列的任何过滤都会因为列的顺序而导致性能变化!

I'm noticing something a bit unexpected with how SQL Server (SQL Server 2008 in this case) treats correlated subqueries within a select statement. My assumption was that a query plan should not be affected by the mere order in which subqueries (or columns, for that matter) are written within the projection clause of the select statement. However, this does not appear to be the case.

Consider the following two queries, which are identical except for the ordering of the subqueries within the CTE:

--query 1: subquery for Color is second
WITH vw AS
(
 SELECT p.[ID],
  (SELECT TOP(1) [FirstName] FROM [Preference] WHERE p.ID = ID AND [FirstName] IS NOT NULL ORDER BY [LastModified] DESC) [FirstName],
  (SELECT TOP(1) [Color] FROM [Preference] WHERE p.ID = ID AND [Color] IS NOT NULL ORDER BY [LastModified] DESC) [Color]
 FROM Person p
)
SELECT ID, Color, FirstName
FROM vw
WHERE Color = 'Gray';


--query 2: subquery for Color is first
WITH vw AS
(
 SELECT p.[ID],
  (SELECT TOP(1) [Color] FROM [Preference] WHERE p.ID = ID AND [Color] IS NOT NULL ORDER BY [LastModified] DESC) [Color],
  (SELECT TOP(1) [FirstName] FROM [Preference] WHERE p.ID = ID AND [FirstName] IS NOT NULL ORDER BY [LastModified] DESC) [FirstName]
 FROM Person p
)
SELECT ID, Color, FirstName
FROM vw
WHERE Color = 'Gray';

If you look at the two query plans, you'll see that an outer join is used for each subquery and that the order of the joins is the same as the order the subqueries are written. There is a filter applied to the result of the outer join for color, to filter out rows where the color is not 'Gray'. (It's odd to me that SQL would use an outer join for the color subquery since I have a non-null constraint on the result of the color subquery, but OK.)

Most of the rows are removed by the color filter. The result is that query 2 is significantly cheaper than query 1 because fewer rows are involved with the second join. All reasons for constructing such a statement aside, is this an expected behavior? Shouldn't SQL server opt to move the filter as early as possible in the query plan, regardless of the order the subqueries are written?

Edit: Just to clarify, there is a valid reason I'm exploring this scenario. I may need to create a view that involves similarly constructed subqueries, and it is now apparent that any filtering based on these columns projected from the view will vary in performance just because of the ordering of the columns!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

不忘初心 2024-09-02 19:25:49

随着 TOP 运算符在这里发挥作用,查询优化器对统计数据非常盲目,因此它将寻找有关如何最好地工作的其他线索,例如首先实例化 CTE 的相关部分。

它是一个外连接,因为如果没有返回任何内容,子查询将被用作 NULL,并且系统首先实例化它。如果您使用聚合而不是 TOP,您可能会得到一个略有不同但更一致的计划。

With the TOP operator coming into play here, the Query Optimizer is remarkably blind about the statistics, so it will look for other clues about how best to work it, such as instantiating relevant parts of the CTE first.

And it's an outer join because the subquery will be used as NULL if nothing is returned, and the system is instantiating it first. If you were using an aggregate instead of TOP, you'd probably get a slightly different but more consistent plan.

烙印 2024-09-02 19:25:49

这是一个可能表现更好的替代版本:

With Colors As
    (
    Select Id, [Color]
        , ROW_NUMBER() OVER ( PARTITION BY ID ORDER BY [LastModified] DESC ) As Num
    From Preference
    Where [Color] Is Not Null
    )
    , Names As
    (
    Select Id, [FirstName]
        , ROW_NUMBER() OVER ( PARTITION BY ID ORDER BY [LastModified] DESC ) As Num
    From Preference
    Where [FirstName] Is Not Null
    )
Select
From Person As P
    Join Colors As C
        On C.Id = P.Id
            And C.Num = 1
    Left Join Names As N
        On N.Id = P.Id
            And N.Num = 1
Where C.[Color]= 'Grey'

另一个更简洁的解决方案,但可能会也可能不会表现得很好:

With RankedItems
    (
    Select Id, [Color], [FirstName]
        , ROW_NUMBER() OVER ( PARTITION BY ID ORDER BY Case When [Color] Is Not Null 1 Else 0 End DESC, [LastModified] DESC ) As ColorRank
        , ROW_NUMBER() OVER ( PARTITION BY ID ORDER BY Case When [FirstName] Is Not Null 1 Else 0 End DESC, [LastModified] DESC ) As NameRank
    From Preference
    )
Select
From Person As P
    Join RankedItems As RI
        On RI.Id = P.Id
            And RI.ColorRank = 1
    Left Join RankedItems As RI2
        On RI2.Id = P.Id
            And RI2.NameRank = 1
Where RI.[Color]= 'Grey'

Here is an alternate version that might perform better:

With Colors As
    (
    Select Id, [Color]
        , ROW_NUMBER() OVER ( PARTITION BY ID ORDER BY [LastModified] DESC ) As Num
    From Preference
    Where [Color] Is Not Null
    )
    , Names As
    (
    Select Id, [FirstName]
        , ROW_NUMBER() OVER ( PARTITION BY ID ORDER BY [LastModified] DESC ) As Num
    From Preference
    Where [FirstName] Is Not Null
    )
Select
From Person As P
    Join Colors As C
        On C.Id = P.Id
            And C.Num = 1
    Left Join Names As N
        On N.Id = P.Id
            And N.Num = 1
Where C.[Color]= 'Grey'

Another solution which is more concise but may or may not perform as well:

With RankedItems
    (
    Select Id, [Color], [FirstName]
        , ROW_NUMBER() OVER ( PARTITION BY ID ORDER BY Case When [Color] Is Not Null 1 Else 0 End DESC, [LastModified] DESC ) As ColorRank
        , ROW_NUMBER() OVER ( PARTITION BY ID ORDER BY Case When [FirstName] Is Not Null 1 Else 0 End DESC, [LastModified] DESC ) As NameRank
    From Preference
    )
Select
From Person As P
    Join RankedItems As RI
        On RI.Id = P.Id
            And RI.ColorRank = 1
    Left Join RankedItems As RI2
        On RI2.Id = P.Id
            And RI2.NameRank = 1
Where RI.[Color]= 'Grey'
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文