自连接中提到的 SQL Server CTE 速度慢

发布于 2024-09-06 13:27:07 字数 1351 浏览 7 评论 0原文

我编写了一个表值 UDF,它以 CTE 开头,返回大型表中的行子集。 CTE 中有多个联接。几个内部连接和一个左连接到其他表,这些表不包含很多行。 CTE 有一个 where 子句,用于返回日期范围内的行,以便仅返回所需的行。

然后,我在 4 个自左连接中引用此 CTE,以便使用不同的标准构建小计。

查询相当复杂,但这里是它的简化伪版本

WITH DataCTE as
(
     SELECT [columns] FROM table
                      INNER JOIN table2
                      ON [...]

                      INNER JOIN table3
                      ON [...]

                      LEFT JOIN table3
                      ON [...]
)
SELECT [aggregates_columns of each subset] FROM DataCTE Main
LEFT JOIN DataCTE BananasSubset
               ON [...] 
             AND Product = 'Bananas'
             AND Quality = 100
LEFT JOIN DataCTE DamagedBananasSubset
               ON [...]
             AND Product = 'Bananas'
             AND Quality < 20
LEFT JOIN DataCTE MangosSubset
               ON [...]
GROUP BY [

我感觉 SQL Server 感到困惑并为每个自连接调用 CTE,这似乎通过查看执行计划得到证实,尽管我承认不是专家在阅读这些内容时。

我假设 SQL Server 足够聪明,只从 CTE 执行一次数据检索,而不是执行多次。

我尝试了相同的方法,但不是使用 CTE 来获取数据子集,而是使用与 CTE 中相同的选择查询,但将其输出到临时表。

引用 CTE 版本的版本需要 40 秒。引用临时表的版本需要 1 到 2 秒的时间。

为什么 SQL Server 不够智能,无法将 CTE 结果保留在内存中?

我喜欢 CTE,尤其是在这种情况下,因为我的 UDF 是表值 UDF,因此它允许我将所有内容保留在单个语句中。

要使用临时表,我需要编写一个多语句表值 UDF,我发现这是一个稍微不太优雅的解决方案。

你们中的一些人是否遇到过此类 CTE 性能问题?如果有,您是如何解决这些问题的?

谢谢,

卡洛斯

I have written a table-valued UDF that starts by a CTE to return a subset of the rows from a large table.
There are several joins in the CTE. A couple of inner and one left join to other tables, which don't contain a lot of rows.
The CTE has a where clause that returns the rows within a date range, in order to return only the rows needed.

I'm then referencing this CTE in 4 self left joins, in order to build subtotals using different criterias.

The query is quite complex but here is a simplified pseudo-version of it

WITH DataCTE as
(
     SELECT [columns] FROM table
                      INNER JOIN table2
                      ON [...]

                      INNER JOIN table3
                      ON [...]

                      LEFT JOIN table3
                      ON [...]
)
SELECT [aggregates_columns of each subset] FROM DataCTE Main
LEFT JOIN DataCTE BananasSubset
               ON [...] 
             AND Product = 'Bananas'
             AND Quality = 100
LEFT JOIN DataCTE DamagedBananasSubset
               ON [...]
             AND Product = 'Bananas'
             AND Quality < 20
LEFT JOIN DataCTE MangosSubset
               ON [...]
GROUP BY [

I have the feeling that SQL Server gets confused and calls the CTE for each self join, which seems confirmed by looking at the execution plan, although I confess not being an expert at reading those.

I would have assumed SQL Server to be smart enough to only perform the data retrieval from the CTE only once, rather than do it several times.

I have tried the same approach but rather than using a CTE to get the subset of the data, I used the same select query as in the CTE, but made it output to a temp table instead.

The version referring the CTE version takes 40 seconds. The version referring the temp table takes between 1 and 2 seconds.

Why isn't SQL Server smart enough to keep the CTE results in memory?

I like CTEs, especially in this case as my UDF is a table-valued one, so it allowed me to keep everything in a single statement.

To use a temp table, I would need to write a multi-statement table valued UDF, which I find a slightly less elegant solution.

Did some of you had this kind of performance issues with CTE, and if so, how did you get them sorted?

Thanks,

Kharlos

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

浅唱々樱花落 2024-09-13 13:27:07

我相信每次都会检索 CTE 结果。使用临时表,结果将被存储直到被删除。这似乎可以解释当您切换到临时表时所看到的性能提升。

另一个好处是您可以在临时表上创建索引,而对 cte 则无法这样做。不确定对您的情况是否有好处,但很高兴知道。

相关阅读:

引用最后一个链接:

CTE 的基础查询将是
每次被引用时都会调用
紧随其后的查询。

我会说使用临时表。不幸的是,优雅并不总是最好的解决方案。

更新:

嗯,这让事情变得更加困难。如果不看看你们的整个环境,我很难说。

一些想法:

  • 您可以使用存储过程代替 UDF(而不是从内部)吗?
  • 这可能是不可能的,但如果您可以从 CTE 中删除左连接,则可以将其移动到索引视图中。如果您能够做到这一点,您甚至可能会看到比临时表更高的性能。

I believe that CTE results are retrieved every time. With a temp table the results are stored until it is dropped. This would seem to explain the performance gains you saw when you switched to a temp table.

Another benefit is that you can create indexes on a temporary table which you can't do to a cte. Not sure if there would be a benefit in your situation but it's good to know.

Related reading:

Quote from the last link:

The CTE's underlying query will be
called each time it is referenced in
the immediately following query.

I'd say go with the temp table. Unfortunately elegant isn't always the best solution.

UPDATE:

Hmmm that makes things more difficult. It's hard for me to say with out looking at your whole environment.

Some thoughts:

  • can you use a stored procedure instead of a UDF (instead, not from within)?
  • This may not be possible but if you can remove the left join from you CTE you could move that into an indexed view. If you are able to do this you may see performance gains over even the temp table.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文