为什么这个（不相关的）子查询会导致这样的问题？

发布于 2024-09-16 03:37:56 字数 1038 浏览 9 评论 0原文

我有一个大型查询，其中一个简单的子查询优化将其从 8 分钟缩短到 20 秒。我不确定我是否理解为什么优化会产生如此巨大的效果。

本质上，这是问题部分：

SELECT  (bunch of stuff)
FROM
  a LEFT OUTER JOIN b ON a.ID = b.a
  LEFT OUTER JOIN c ON b.ID = c.b
  ...
  ...
      INNER JOIN veryLargeTable 
      ON a.ID = veryLargeTable.a 
         AND veryLargeTable.PetID = 
             (SELECT id from Pets WHERE Pets.Name = 'Something')    /* BAD! */
  ...
  ...

总共有 16 个连接表。如果我将 veryLargeTable 连接的第二个谓词替换为包含 petID 的预填充变量（而不是使用子查询），则整个查询速度将显着提高：

AND veryLargeTable.PetID = @petID   /* Awesome! */

显然，每行都会执行 (SELECT id from Pets WHERE Name = 'Something') 。有两件事我不完全理解：

据我所知，这是一个不相关的子查询。 Pets 表根本不是外部查询的一部分。非相关子查询不是独立评估（并因此优化）的吗？为什么这里的情况不是这样？
执行计划截然不同。在失败的情况下（上面），整个子树处理估计 950k 行。在获胜的情况下（使用变量而不是子查询），估计只有大约 125k 行。这是怎么回事？如果存在该子查询，为什么会涉及这么多行？ Pets.Name 列肯定具有唯一的数据（但据我所知，没有唯一的约束）。

请注意，正如我所期望的那样，将谓词移至 WHERE 子句不会影响任何一种情况下的查询，因为它是 INNER JOIN。

见解赞赏！

原文

I've got a large query where a simple subquery optimization dropped it from 8 minutes down to 20 seconds. I'm not sure I understand why the optimization had such a drastic effect.

In essence, here's the problem part:

SELECT  (bunch of stuff)
FROM
  a LEFT OUTER JOIN b ON a.ID = b.a
  LEFT OUTER JOIN c ON b.ID = c.b
  ...
  ...
      INNER JOIN veryLargeTable 
      ON a.ID = veryLargeTable.a 
         AND veryLargeTable.PetID = 
             (SELECT id from Pets WHERE Pets.Name = 'Something')    /* BAD! */
  ...
  ...

In all, there are 16 joined tables. If I replace the second predicate of the veryLargeTable join with a pre-populated variable containing the petID (instead of using the subquery) the entire query speeds up dramatically:

AND veryLargeTable.PetID = @petID   /* Awesome! */

Obviously, (SELECT id from Pets WHERE Name = 'Something') is being executed for every row. There are two things I don't fully understand:

As far as I can tell, this is a non-correlated subquery. The Pets table is not part of the outer query at all. Aren't non-correlated subqueries independently evaluated (and hence optimized)? Why isn't this the case here?
The execution plans are dramatically different. In the failure case (above), the entire subtree deals with an estimated 950k rows. In the win case (using a variable instead of a subquery), there's only about 125k estimated rows. What's going on? Why are so many more rows involved if that subquery is there? The Pets.Name column definitely has unique data (but no unique constraint as far as I can tell).

Note that moving the predicate to the WHERE clause doesn't affect the query in either case, as I would expect, since it's an INNER JOIN.

Insights appreciated!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

若水般的淡然安静女子 2024-09-23 03:37:56

根据我的经验，查询越复杂，SQL 优化器创建灵活计划的能力就越差。这里你有 16 个连接，一些或大部分是外部连接，你至少有一个子查询......扔进足够的索引，基数，视图，外部应用，谁知道还有什么，没有人，甚至微软都没有工程师*，可以找出统一、定期生成最佳计划的例程。

你所描述的，我已经经历过很多次了——在混乱的查询中改变一个简单的事情，一切都会快一个数量级（或者，咬牙切齿，慢一些）。我没有办法确定什么时候复杂就太复杂了，这更多的是一种感觉。我的一般经验法则是，如果它看起来太长或太复杂，请尽可能简化 - 例如您预先选择的单个嵌套值，或者打破查询的一部分总是运行快速使用小结果集，首先运行它并将结果存储在临时表中。

（*请注意，这是轻微的讽刺）

回复收藏 0 原文

可爱咩 2024-09-23 03:37:56

作为替代方案，我认为您可以使用以下方法消除子查询：

...
INNER JOIN veryLargeTable vLT
    ON a.ID = vLT.a 
INNER JOIN Pets p
    ON vLT.PetID = p.id
        and p.Name = 'Something'
...

As an alternative, I think you could eliminate the sub-query with:

...
INNER JOIN veryLargeTable vLT
    ON a.ID = vLT.a 
INNER JOIN Pets p
    ON vLT.PetID = p.id
        and p.Name = 'Something'
...

回复收藏 0 原文

不气馁 2024-09-23 03:37:56

我个人认为，如果 Pets.Name 上没有索引，结果并不奇怪。如果您在 Pets.Name 上创建唯一索引，您可能会看到更好的结果。从服务器的角度来看，如果没有索引，子查询可能会返回多行或 NULL。也许优化器可以做得更好；它经常需要帮助。

回复收藏 0 原文

病女 2024-09-23 03:37:56

原因是正如您所指出的，根据我的经验，即使是最简单的非相关子查询通常也会被 SQL Server 的查询优化器重新计算。

例如，您可以查看以下查询的执行计划，发现不相关的子查询被重新计算。

SELECT ID
FROM #table1
WHERE ID in (SELECT ID from #table1)
UNION ALL
SELECT ID
FROM #table1
WHERE ID in (SELECT ID from #table1)

属性上有或没有聚集索引，在本例中为“ID”。正如有人指出的，您可以重写此查询以使用联接而不是子查询。然而，在许多情况下，如果子查询返回聚合标量，

where ID = (select MAX(ID) from #table1)

那么连接重写可能不会那么容易进行。

The reason is as you pointed out, and from my experience, that often even the simplest non-correlated subqueries are often recomputed by SQL Server's query optimizer.

For example, you can look at the execution plan of the following query and see that the non-correlated subquery is recomputed.

SELECT ID
FROM #table1
WHERE ID in (SELECT ID from #table1)
UNION ALL
SELECT ID
FROM #table1
WHERE ID in (SELECT ID from #table1)

This is with or without a clustered index on the attribute, "ID" in this case. As someone pointed out, you could rewrite this query to use a join instead of the subquery. In many cases that can be done, however, if the subquery returns instead an aggregate scalar e.g.

where ID = (select MAX(ID) from #table1)

then a join rewrite might not work so easily.

回复收藏 0 原文

~没有更多了~

关于作者

心的位置

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

为什么这个（不相关的）子查询会导致这样的问题？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

亚希

cyp

北漠

11223456

坠似风落

游魂

友情链接

为什么这个（不相关的）子查询会导致这样的问题？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

亚希

cyp

北漠

11223456

坠似风落

游魂

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。