自连接中提到的 SQL Server CTE 速度慢

发布于 2024-09-06 13:27:07 字数 1351 浏览 12 评论 0原文

我编写了一个表值 UDF，它以 CTE 开头，返回大型表中的行子集。 CTE 中有多个联接。几个内部连接和一个左连接到其他表，这些表不包含很多行。 CTE 有一个 where 子句，用于返回日期范围内的行，以便仅返回所需的行。

然后，我在 4 个自左连接中引用此 CTE，以便使用不同的标准构建小计。

查询相当复杂，但这里是它的简化伪版本

WITH DataCTE as
(
     SELECT [columns] FROM table
                      INNER JOIN table2
                      ON [...]

                      INNER JOIN table3
                      ON [...]

                      LEFT JOIN table3
                      ON [...]
)
SELECT [aggregates_columns of each subset] FROM DataCTE Main
LEFT JOIN DataCTE BananasSubset
               ON [...] 
             AND Product = 'Bananas'
             AND Quality = 100
LEFT JOIN DataCTE DamagedBananasSubset
               ON [...]
             AND Product = 'Bananas'
             AND Quality < 20
LEFT JOIN DataCTE MangosSubset
               ON [...]
GROUP BY [

我感觉 SQL Server 感到困惑并为每个自连接调用 CTE，这似乎通过查看执行计划得到证实，尽管我承认不是专家在阅读这些内容时。

我假设 SQL Server 足够聪明，只从 CTE 执行一次数据检索，而不是执行多次。

我尝试了相同的方法，但不是使用 CTE 来获取数据子集，而是使用与 CTE 中相同的选择查询，但将其输出到临时表。

引用 CTE 版本的版本需要 40 秒。引用临时表的版本需要 1 到 2 秒的时间。

为什么 SQL Server 不够智能，无法将 CTE 结果保留在内存中？

我喜欢 CTE，尤其是在这种情况下，因为我的 UDF 是表值 UDF，因此它允许我将所有内容保留在单个语句中。

要使用临时表，我需要编写一个多语句表值 UDF，我发现这是一个稍微不太优雅的解决方案。

你们中的一些人是否遇到过此类 CTE 性能问题？如果有，您是如何解决这些问题的？

谢谢，

卡洛斯

原文

I have written a table-valued UDF that starts by a CTE to return a subset of the rows from a large table.
There are several joins in the CTE. A couple of inner and one left join to other tables, which don't contain a lot of rows.
The CTE has a where clause that returns the rows within a date range, in order to return only the rows needed.

I'm then referencing this CTE in 4 self left joins, in order to build subtotals using different criterias.

The query is quite complex but here is a simplified pseudo-version of it

WITH DataCTE as
(
     SELECT [columns] FROM table
                      INNER JOIN table2
                      ON [...]

                      INNER JOIN table3
                      ON [...]

                      LEFT JOIN table3
                      ON [...]
)
SELECT [aggregates_columns of each subset] FROM DataCTE Main
LEFT JOIN DataCTE BananasSubset
               ON [...] 
             AND Product = 'Bananas'
             AND Quality = 100
LEFT JOIN DataCTE DamagedBananasSubset
               ON [...]
             AND Product = 'Bananas'
             AND Quality < 20
LEFT JOIN DataCTE MangosSubset
               ON [...]
GROUP BY [

I have the feeling that SQL Server gets confused and calls the CTE for each self join, which seems confirmed by looking at the execution plan, although I confess not being an expert at reading those.

I would have assumed SQL Server to be smart enough to only perform the data retrieval from the CTE only once, rather than do it several times.

I have tried the same approach but rather than using a CTE to get the subset of the data, I used the same select query as in the CTE, but made it output to a temp table instead.

The version referring the CTE version takes 40 seconds. The version referring the temp table takes between 1 and 2 seconds.

Why isn't SQL Server smart enough to keep the CTE results in memory?

I like CTEs, especially in this case as my UDF is a table-valued one, so it allowed me to keep everything in a single statement.

To use a temp table, I would need to write a multi-statement table valued UDF, which I find a slightly less elegant solution.

Did some of you had this kind of performance issues with CTE, and if so, how did you get them sorted?

Thanks,

Kharlos

分享到QQ

分享到微博