当前位置：文江博客话题详情

使用 CTE 有哪些优点/缺点？

发布于 2024-07-05 03:43:37 字数 82 浏览 7 评论 0原文

我正在考虑提高某些 SQL 的性能，目前 CTE 在脚本中被多次使用和引用。使用表变量可以得到改进吗？（不能使用临时表，因为代码位于函数内）。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

兔姬 2024-07-12 03:43:37

可能不会。 CTE 特别擅长查询数据的树结构。

回复收藏 0 原文

云胡 2024-07-12 03:43:37

这些信息和引用来自 mssqltips.com 上的以下文章“Choose Between SQL Server Subquery T-SQL Code”，作者为 Eric Blinn。 https ://www.mssqltips.com/sqlservertip/6618/sql-server-query-performance-cte-view-subquery-temp-table-table-variable/

SQL Server 2019 CTE、子查询和视图

SQL Server [2019] 引擎会优化为其提供的每个查询。什么时候
它遇到 CTE、传统子查询或视图，它会看到它们全部
以相同的方式并以相同的方式优化它们。这涉及到寻找
在基础表中，考虑其统计数据，并选择
继续的最佳方式。在大多数情况下，他们会返回相同的计划
因此执行完全相同。

临时数据库表

对于将行插入临时表的查询，
优化器查看表统计信息并选择最佳方式
向前。它实际上为临时表创建了新的统计信息
表，然后用它们来运行第二个。这带来了非常
类似的性能。

表变量

在文章中给出的示例中，表变量性能较差，由于缺少表统计信息。

...表变量没有生成任何表统计信息
就像 TempDB 表一样。这意味着优化器必须做出
疯狂猜测如何进行。在这个例子中，它做了一个非常非常
糟糕的决定。
这并不是注销表变量。他们肯定有他们的
放置的位置将在提示中稍后讨论。

临时表与表变量

临时表将存储在磁盘上并具有统计信息
对其进行计算，而表变量则不会。因为这
当预期行数达到时，差异临时表是最好的
>100 并且表变量用于较小的预期行数，其中缺少统计信息不太可能导致错误的查询计划。

回复收藏 0 原文

心头的小情儿 2024-07-12 03:43:37

CTE 的优点

CTE 可以称为“临时视图”，在某些情况下用作视图的良好替代方案。
相对于视图的主要优点是内存的使用。由于 CTE 的范围仅限于其批次，因此一旦其批次交叉，为其分配的内存就会被刷新。但是一旦创建了视图，它就会被存储直到用户删除它。如果视图在创建后没有被使用，那么这只是浪费内存。
与 View 相比，CTE 执行的 CPU 成本更低。
与 View 一样，CTE 不存储其定义的任何元数据，并提供更好的可读性。
一个 CTE 可以在查询中被多次引用。
由于范围仅限于批次，因此多个 CTE 可以具有相同的名称，而视图不能具有相同的名称。
它可以递归。

CTE 的缺点

虽然使用 CTE 具有优势，但它确实有一些限制需要记住，
我们知道它是视图的替代品，但 CTE 不能嵌套，而视图可以嵌套。
声明后的视图可以多次使用，但不能使用 CTE。每次你想使用它时都应该声明它。对于这种场景，不建议使用CTE，因为用户一次又一次地声明批次是一件很累人的工作。
锚成员之间应该有 UNION、UNION ALL 或 EXCEPT 等运算符。
在递归 CTE 中，您可以定义许多锚定成员和递归成员，但所有锚定成员必须在第一个递归成员之前定义。您不能在两个递归成员之间定义锚定成员。
锚点成员和递归成员中使用的列数、数据类型应该相同。
在 Recursive Member 中，不允许使用 TOP 等聚合函数、DISTINCT 等运算符、HAVING 和 GROUP BY 等子句、子查询、Left Outer、Right Outer 或 Full Outer 等连接。关于连接，Recursive Member 中只允许使用 Inner Join。
递归限制为32767，超过这个限制会导致服务器因无限循环而崩溃。

回复收藏 0 原文

我是男神闪亮亮 2024-07-12 03:43:37

CTE 只不过是语法糖。
它增强可读性并避免重复。

只需将其视为 WITH() 子句中指定的实际语句的占位符即可。引擎将使用此语句替换查询中出现的任何 CTE 名称（与视图非常相似）。这就是内联的含义。

与之前填写的表相比（声明或创建）您会发现优点：

可用于临时查询（函数、视图）
没有意外的副作用（范围最窄）

...以及缺点：

您不能在不同的语句中使用 CTE 的结果
您不能使用索引、统计信息来优化您的 CTE 集（尽管它会隐式使用现有索引以及目标对象的统计数据 - 如果适用）。

在性能方面，持久集（声明或创建的表）在某些情况下可能（好得多！）更好，但它迫使您使用过程代码。您将不得不与您的马赛跑来找出哪一个更好...

示例：执行相同操作的各种方法

以下简单（相当无用）的示例描述了一组用户表及其列。我使用各种不同的方法来告诉 SQL-Server 我想要什么：

尝试使用“包括实际执行计划”

USE master; --in my case the master database has just 5 "user tables", you can use any other DB of course
GO
    
--simple join, first the small set joining to the large set
SELECT o.name AS TableName
      ,c.name AS ColumnName
FROM sys.objects o
INNER JOIN sys.columns c ON c.object_id=o.object_id 
WHERE o.type='U';
GO

--simple join "the other way round" with the filter as part of the ON-clause
SELECT o.name AS TableName
      ,c.name AS ColumnName
FROM sys.columns c
INNER JOIN sys.objects o ON c.object_id=o.object_id AND o.type='U';
GO

--join from the large set with a sub-query to the small set
SELECT o.name AS TableName
      ,c.name AS ColumnName
FROM sys.columns c
INNER JOIN (
    SELECT o.* 
    FROM sys.objects o
    WHERE o.type='U' --user tables 
) o ON c.object_id=o.object_id;
GO

--join for large to small with a row-wise APPLY
SELECT o.name AS TableName
      ,c.name AS ColumnName
FROM sys.columns c
CROSS APPLY (
    SELECT o.* 
    FROM sys.objects o
    WHERE o.type='U' --user tables 
    AND o.object_id=c.object_id
) o;
GO

--use a CTE to "pre-filter" the small set
WITH cte AS
(
    SELECT o.* 
    FROM sys.objects o
    WHERE o.type='U' --user tables 
)
SELECT cte.name AS TableName
      ,c.name AS ColumnName
FROM sys.columns c
INNER JOIN cte ON c.object_id=cte.object_id;
GO

现在查看结果和执行计划：

所有查询都返回相同的结果。
所有查询都会产生相同的执行计划

重要提示：这可能在您的计算机上有所不同！

这是为什么？

T-SQL 是一种声明性语言。您的声明是对您想要检索的内容的描述。您的工作不是告诉引擎这是如何完成的。

SQL-Server 极其智能的引擎将找到获得您所要求的集合的最佳方法。在上述情况下，所有结果描述都指向同一目标。引擎可以从各种语句中推断这一点，并为所有语句找到相同的计划。

嗯，这只是一个品味问题吗？

在某种程度上......

有一些重要的事情需要记住：

引擎没有理由在之前< /em> 其余部分（尽管该声明可能看起来是这样）。因此，将 CTE 描述为类似于临时表的东西是错误...
换句话说：语句的可见顺序不会< strong>不能预测实际的执行顺序！
智能引擎将在复杂性和嵌套级别上达到其极限。想象一下各种 VIEW，都使用 CTE 并互相调用......
在某些情况下，引擎真的很糟糕。我记得有一个案例，CTE 与 TRY_CAST 没什么区别。这个想法是为了确保下面的查询中的值有效。但引擎想：“哦，只是一个 CAST，不贵！” 并将实际 CAST 包含到更高位置的执行计划中。我记得另一个例子，引擎对数百万行执行了昂贵的操作（不必要地，最终结果被过滤到一个很小的集合），只是因为实际的执行顺序不符合预期。

好吧...那么我什么时候应该使用 CTE？

以下几点是使用 CTE 的充分理由：

CTE 可以帮助您避免重复的子查询。
CTE 可以在语句中多次使用，例如在 JOIN 中，其动态行为取决于实际行数。
您可以在一条语句中使用多个 CTE，并且可以在后续 CTE 中使用一个 CTE 的结果。
有递归（或更好的迭代）CTE。
有时我使用单行-CTE 来定义/预计算稍后在查询中使用的变量。您将在过程 T-SQL 中使用声明的变量执行的操作。您可以使用CROSS JOIN轻松地将它们放入您的查询中。
而且也非常好：可更新的 CTE 允许非常易于阅读的语句，同样适用用于删除。
如上所述：没有 CTE，没有什么是做不到的，但它读起来要好得多（我真的很喜欢说出名字）。

最终提示

嗯，在某些情况下，丑陋的代码表现更好:-)

拥有干净且可读的代码总是好的。 CTE 将帮助您解决此问题。所以试试吧。如果性能不好，请深入研究执行计划，并尝试找到引擎可能做出错误决定的原因。

在大多数情况下，试图通过诸如FORCE ORDER之类的提示来智胜引擎是一个坏主意（但可以提供帮助）

UPDATE

我被要求具体指出优点和缺点：

嗯，从技术上来说，没有真正的优点或缺点。抛开递归 CTE 不谈，没有 CTE 就没有解决不了的问题。

优点
主要优点是可读性和可维护性。
有时一个 CTE 可以节省数百行代码。可以只使用名称作为变量，而不是重复巨大的子查询。对子查询的更正只需在一处即可解决。
CTE 可以提供临时查询服务，让您的生活更轻松。

缺点
一个可能的缺点是，即使对于经验丰富的开发人员来说，也很容易将 CTE 误认为临时表，假设可见的步骤顺序与实际的执行顺序相同，并陷入困境意外结果甚至错误。
而且 - 当然:-) - 当您在另一个语句后编写 CTE 而没有分隔 ; 时，您会看到奇怪的 错误语法 错误。这就是为什么很多人倾向于使用 ;WITH。

A CTE is not much more than syntactic sugar.
It enhances the readability and allows to avoid repetition.

Just think of it as a placeholder for the actual statement specified in the WITH()-clause. The engine will replace any occurance of the CTE's name in your query with this statement (quite similar to a view). This is the meaning of inline.

Compared to a previously filled table (delared or created) You'll find advantages:

useable in ad-hoc-queries (functions, views)
no unexpected side effects (most narrow scope)

...and disadvantages:

You cannot use the CTE's result in different statements
You cannot use indexes, statistics to optimize your CTE's set (although it will implicitly use existing indexes and statistics of the targeted objects - if appropriate).

In terms of performance a persisted set (declared or created table) can be (much!) better in some cases, but it forces you into procedural code. You will have to race your horses to find out which is better...

Example: Various approaches to do the same

The following simple (rather useless) example describes a set of user tables together with their columns. I use various different approaches to tell SQL-Server what I want:

Try this with "include actual execution plan"

USE master; --in my case the master database has just 5 "user tables", you can use any other DB of course
GO
    
--simple join, first the small set joining to the large set
SELECT o.name AS TableName
      ,c.name AS ColumnName
FROM sys.objects o
INNER JOIN sys.columns c ON c.object_id=o.object_id 
WHERE o.type='U';
GO

--simple join "the other way round" with the filter as part of the ON-clause
SELECT o.name AS TableName
      ,c.name AS ColumnName
FROM sys.columns c
INNER JOIN sys.objects o ON c.object_id=o.object_id AND o.type='U';
GO

--join from the large set with a sub-query to the small set
SELECT o.name AS TableName
      ,c.name AS ColumnName
FROM sys.columns c
INNER JOIN (
    SELECT o.* 
    FROM sys.objects o
    WHERE o.type='U' --user tables 
) o ON c.object_id=o.object_id;
GO

--join for large to small with a row-wise APPLY
SELECT o.name AS TableName
      ,c.name AS ColumnName
FROM sys.columns c
CROSS APPLY (
    SELECT o.* 
    FROM sys.objects o
    WHERE o.type='U' --user tables 
    AND o.object_id=c.object_id
) o;
GO

--use a CTE to "pre-filter" the small set
WITH cte AS
(
    SELECT o.* 
    FROM sys.objects o
    WHERE o.type='U' --user tables 
)
SELECT cte.name AS TableName
      ,c.name AS ColumnName
FROM sys.columns c
INNER JOIN cte ON c.object_id=cte.object_id;
GO

Now look at the result and at the execution plans:

All queries return the same result.
All queries produce the same execution plan

Important hint: This might differ on your machine!

Why is this?

T-SQL is a declarative language. Your statement is a description of WHAT you want to retrieve. It is not your job to tell the engine HOW this is done.

SQL-Server's extremely smart engine will find the best way to get the set you asked for. In the case above all result descriptions point to the same goal. The engine can deduce this from various statements and finds the same plan for all of them.

Well, is it just a matter of taste?

In a way...

There are some important things to keep in mind:

There is no reason for the engine to compute the CTE's result before the rest (although the statement might look so). Therefore it is wrong to describe a CTE as something like a temp table...
In other words: The visible order of your statement does not predict the actual order of execution!
The smart engine will reach its limits with complexity and nest level. Imagine various VIEWs, all using CTEs and calling each-other...
There are cases where the engine really f**s up. I remember a case where a CTE did not much more than a TRY_CAST. The idea was to ensure valid values in the query below. But the engine thought "Oh, just a CAST, not expensiv!" and included the acutal CAST to the execution plan on a higher position. I remember another case where the engine performed an expensive operation against millions of rows (unnecessarily, the final result was filtered to a tiny set), just because the actual order of execution was not as expected.

Okay... So when should I use a CTE?

The following points are good reasons to use a CTE:

A CTE can help you to avoid repeated sub queries.
A CTE can be used multiple times within your statement, e.g. within a JOIN with a dynamic behavior depending on the actual row-count.
You can use multiple CTEs within one statement and you can use the result of one CTE within a later CTE.
There are recursive (or better iterative) CTEs.
Sometimes I used single-row-CTEs to define / pre-compute variables later used in the query. Things you would do with declared variables in procedural T-SQL. You can use A CROSS JOIN to get them into your query easily.
and also very nice: the updatable CTE allows for very easy-to-read statements, same applies for DELETE.
As above: Nothing one could not do without the CTE, but it is far better to read (I really like speaking names).

Final hints

Well, there are cases, where ugly code performs better :-)

It is always good to have clean and readable code. A CTE will help you with this. So give it a try. If the performance is bad, get into depth, look at the execution plans and try to find a reason where the engine might decide wrong.

In most cases it is a bad idea trying to outsmart the engine with hints such as FORCE ORDER (but in can help)

UPDATE

I was asked to point to advantages and disadvantages specifically:

Uhm, technically there are no real advantages or disadvantages. Disregarding recursive CTEs there's nothing one couldn't solve without a CTE.

Advantages
The main advantage is readability and maintainabilty.
Sometimes a CTE can save hundreds of lines of code. Instead of a repeating a huge sub-query one can use just a name as a variable. Corrections to the sub-query can be solved just in one place.
The CTE can serve in ad-hoc queries and make your life easier.

Disadvantages
One possible disadvantage is that it's very easy, even for experienced developers, to mistake a CTE as a temp table, assume that the visible order of steps will be the same as the acutal order of execution and stumble into unexpected results or even errors.
And - of course :-) - the strange wrong syntax error you'll see when you write a CTE after another statement without a separating ;. That's why many people tend to use ;WITH.

回复收藏 0 原文