使用 CTE 有哪些优点/缺点?

发布于 2024-07-05 03:43:37 字数 82 浏览 7 评论 0原文

我正在考虑提高某些 SQL 的性能,目前 CTE 在脚本中被多次使用和引用。 使用表变量可以得到改进吗? (不能使用临时表,因为代码位于函数内)。

I'm looking at improving the performance of some SQL, currently CTEs are being used and referenced multiple times in the script. Would I get improvements using a table variable instead? (Can't use a temporary table as the code is within functions).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

兔姬 2024-07-12 03:43:37

可能不会。 CTE 特别擅长查询数据的树结构。

Probably not. CTE's are especially good at querying data for tree structures.

云胡 2024-07-12 03:43:37

这些信息和引用来自 mssqltips.com 上的以下文章“Choose Between SQL Server Subquery T-​​SQL Code”,作者为 Eric Blinn。 https ://www.mssqltips.com/sqlservertip/6618/sql-server-query-performance-cte-view-subquery-temp-table-table-variable/

SQL Server 2019 CTE、子查询和视图

SQL Server [2019] 引擎会优化为其提供的每个查询。 什么时候
它遇到 CTE、传统子查询或视图,它会看到它们全部
以相同的方式并以相同的方式优化它们。 这涉及到寻找
在基础表中,考虑其统计数据,并选择
继续的最佳方式。 在大多数情况下,他们会返回相同的计划
因此执行完全相同

临时数据库表

对于将行插入临时表的查询,
优化器查看表统计信息并选择最佳方式
向前。 它实际上为临时表创建了新的统计信息
表,然后用它们来运行第二个。 这带来了非常
类似的性能。

表变量

在文章中给出的示例中,表变量性能较差由于缺少表统计信息

...表变量没有生成任何表统计信息
就像 TempDB 表一样。 这意味着优化器必须做出
疯狂猜测如何进行。 在这个例子中,它做了一个非常非常
糟糕的决定。

这并不是注销表变量。 他们肯定有他们的
放置的位置将在提示中稍后讨论。

临时表与表变量

临时表将存储在磁盘上并具有统计信息
对其进行计算,而表变量则不会。 因为这
当预期行数达到时,差异临时表是​​最好的
>100 并且表变量用于较小的预期行数,其中缺少统计信息不太可能导致错误的查询计划。

The information and quotes are from the following article on mssqltips.com "Choose Between SQL Server Subquery T-SQL Code" by Eric Blinn. https://www.mssqltips.com/sqlservertip/6618/sql-server-query-performance-cte-view-subquery-temp-table-table-variable/

SQL Server 2019 CTEs, subqueries, and views

The SQL Server [2019] engine optimizes every query that is given to it. When
it encounters a CTE, traditional subquery, or view, it sees them all
the same way and optimizes them the same way. This involves looking
at the underlying tables, considering their statistics, and choosing
the best way to proceed. In most cases they will return the same plan
and therefore perform exactly the same.

TempDB table

For the query that inserted rows into the temporary table, the
optimizer looked at the table statistics and chose the best way
forward. It actually made new table statistics for the temporary
table and then used them to run the second. This brings about very
similar performance
.

Table variable

The table variable has poor performance in the example given in the article due to lack of table statistics.

...the table variable does not have any table statistics generated for
it like the TempDB table did. This means the optimizer has to make a
wild guess as to how to proceed. In this example it made a very, very
poor decision.

This is not to write off table variables. They surely have their
place as will be discussed later in the tip.

Temp table vs Table variable

A temporary table will be stored on disk and have statistics
calculated on it and a table variable will not. Because of this
difference temporary tables are best when the expected row count
is >100 and the table variable for smaller expected row counts where the lack of statistics will be less likely to lead to a bad query plan.

心头的小情儿 2024-07-12 03:43:37

CTE 的优点

CTE 可以称为“临时视图”,在某些情况下用作视图的良好替代方案。
相对于视图的主要优点是内存的使用。 由于 CTE 的范围仅限于其批次,因此一旦其批次交叉,为其分配的内存就会被刷新。 但是一旦创建了视图,它就会被存储直到用户删除它。 如果视图在创建后没有被使用,那么这只是浪费内存。
与 View 相比,CTE 执行的 CPU 成本更低。
与 View 一样,CTE 不存储其定义的任何元数据,并提供更好的可读性。
一个 CTE 可以在查询中被多次引用。
由于范围仅限于批次,因此多个 CTE 可以具有相同的名称,而视图不能具有相同的名称。
它可以递归。

CTE 的缺点

虽然使用 CTE 具有优势,但它确实有一些限制需要记住,
我们知道它是视图的替代品,但 CTE 不能嵌套,而视图可以嵌套。
声明后的视图可以多次使用,但不能使用 CTE。 每次你想使用它时都应该声明它。 对于这种场景,不建议使用CTE,因为用户一次又一次地声明批次是一件很累人的工作。
锚成员之间应该有 UNION、UNION ALL 或 EXCEPT 等运算符。
在递归 CTE 中,您可以定义许多锚定成员和递归成员,但所有锚定成员必须在第一个递归成员之前定义。 您不能在两个递归成员之间定义锚定成员。
锚点成员和递归成员中使用的列数、数据类型应该相同。
在 Recursive Member 中,不允许使用 TOP 等聚合函数、DISTINCT 等运算符、HAVING 和 GROUP BY 等子句、子查询、Left Outer、Right Outer 或 Full Outer 等连接。 关于连接,Recursive Member 中只允许使用 Inner Join。
递归限制为32767,超过这个限制会导致服务器因无限循环而崩溃。

Advantages of CTE

CTE can be termed as 'Temporary View' used as a good alternative for a View in some cases.
The main advantage over a view is usage of memory. As CTE's scope is limited only to its batch, the memory allocated for it is flushed as soon as its batch is crossed. But once a view is created, it is stored until user drops it. If the view is not used after creation then it's a mere waste of memory.
CPU cost for CTE execution is lesser when compared to that of View.
Like View, CTE doesn't store any metadata of its definition and provides better readability.
A CTE can be referred for multiple times in a query.
As the scope is limited to the batch, multiple CTEs can have the same name which a view cannot have.
It can be made recursive.

Disadvantages of CTE

Though using CTE is advantageous, it does have some limitations to be kept in mind,
We knew that it is a substitute for a view but a CTE cannot be nested while Views can be nested.
View once declared can be used for any number of times but CTE cannot be used. It should be declared every time you want to use it. For this scenario, CTE is not recommended to use as it is a tiring job for user to declare the batches again and again.
Between the anchor members there should be operators like UNION, UNION ALL or EXCEPT etc.
In Recursive CTEs, you can define many Anchor Members and Recursive Members but all the Anchor Members must be defined before the first Recursive Member. You cannot define an Anchor Member between two Recursive Member.
The number of columns, the data types used in Anchor and Recursive Members should be same.
In Recursive Member, aggregate functions like TOP, operator like DISTINCT, clause like HAVING and GROUP BY, Sub-queries, joins like Left Outer or Right Outer or Full Outer are not allowed. Regarding Joins, only Inner Join is allowed in Recursive Member.
Recursion Limit is 32767, crossing which results in the crash of server due to infinite loop.

我是男神闪亮亮 2024-07-12 03:43:37

CTE 只不过是语法糖
增强可读性避免重复

只需将其视为 WITH() 子句中指定的实际语句的占位符即可。 引擎将使用此语句替换查询中出现的任何 CTE 名称(与视图非常相似)。 这就是内联的含义。

与之前填写的表相比(声明或创建)您会发现优点

  • 可用于临时查询(函数、视图)
  • 没有意外的副作用(范围最窄)

...以及缺点

  • 您不能在不同的语句中使用 CTE 的结果
  • 您不能使用索引、统计信息来优化您的 CTE 集(尽管它会隐式使用现有索引以及目标对象的统计数据 - 如果适用)。

性能方面,持久集(声明或创建的表)在某些情况下可能(好得多!)更好,但它迫使您使用过程代码。 您将不得不与您的马赛跑来找出哪一个更好...

示例:执行相同操作的各种方法

以下简单(相当无用)的示例描述了一组用户表及其列。 我使用各种不同的方法来告诉 SQL-Server 我想要什么:

尝试使用“包括实际执行计划”

USE master; --in my case the master database has just 5 "user tables", you can use any other DB of course
GO
    
--simple join, first the small set joining to the large set
SELECT o.name AS TableName
      ,c.name AS ColumnName
FROM sys.objects o
INNER JOIN sys.columns c ON c.object_id=o.object_id 
WHERE o.type='U';
GO

--simple join "the other way round" with the filter as part of the ON-clause
SELECT o.name AS TableName
      ,c.name AS ColumnName
FROM sys.columns c
INNER JOIN sys.objects o ON c.object_id=o.object_id AND o.type='U';
GO

--join from the large set with a sub-query to the small set
SELECT o.name AS TableName
      ,c.name AS ColumnName
FROM sys.columns c
INNER JOIN (
    SELECT o.* 
    FROM sys.objects o
    WHERE o.type='U' --user tables 
) o ON c.object_id=o.object_id;
GO

--join for large to small with a row-wise APPLY
SELECT o.name AS TableName
      ,c.name AS ColumnName
FROM sys.columns c
CROSS APPLY (
    SELECT o.* 
    FROM sys.objects o
    WHERE o.type='U' --user tables 
    AND o.object_id=c.object_id
) o;
GO

--use a CTE to "pre-filter" the small set
WITH cte AS
(
    SELECT o.* 
    FROM sys.objects o
    WHERE o.type='U' --user tables 
)
SELECT cte.name AS TableName
      ,c.name AS ColumnName
FROM sys.columns c
INNER JOIN cte ON c.object_id=cte.object_id;
GO

现在查看结果和执行计划:

  • 所有查询都返回相同的结果。
  • 所有查询都会产生相同的执行计划

重要提示:这可能在您的计算机上有所不同!

这是为什么?

T-SQL 是一种声明性语言。 您的声明是对您想要检索的内容的描述。 您的工作不是告诉引擎这是如何完成的。

SQL-Server 极其智能的引擎将找到获得您所要求的集合的最佳方法。 在上述情况下,所有结果描述都指向同一目标。 引擎可以从各种语句中推断这一点,并为所有语句找到相同的计划。

嗯,这只是一个品味问题吗?

在某种程度上......

有一些重要的事情需要记住:

  • 引擎没有理由在之前< /em> 其余部分(尽管该声明可能看起来是这样)。 因此,将 CTE 描述为类似于临时表的东西是错误...
  • 换句话说:语句的可见顺序不会< strong>不能预测实际的执行顺序!
  • 智能引擎将在复杂性和嵌套级别上达到其极限。 想象一下各种 VIEW,都使用 CTE 并互相调用......
  • 在某些情况下,引擎真的很糟糕。 我记得有一个案例,CTE 与 TRY_CAST 没什么区别。 这个想法是为了确保下面的查询中的值有效。 但引擎想:“哦,只是一个 CAST,不贵!” 并将实际 CAST 包含到更高位置的执行计划中。 我记得另一个例子,引擎对数百万行执行了昂贵的操作(不必要地,最终结果被过滤到一个很小的集合),只是因为实际的执行顺序不符合预期。

好吧...那么我什么时候应该使用 CTE?

以下几点是使用 CTE 的充分理由:

  • CTE 可以帮助您避免重复的子查询。
  • CTE 可以在语句中多次使用,例如在 JOIN 中,其动态行为取决于实际行数。
  • 您可以在一条语句中使用多个 CTE,并且可以在后续 CTE 中使用一个 CTE 的结果。
  • 有递归(或更好的迭代)CTE。
  • 有时我使用单行-CTE 来定义/预计算稍后在查询中使用的变量。 您将在过程 T-SQL 中使用声明的变量执行的操作。 您可以使用CROSS JOIN轻松地将它们放入您的查询中。
  • 而且也非常好:可更新的 CTE 允许非常易于阅读的语句,同样适用用于删除
    如上所述:没有 CTE,没有什么是做不到的,但它读起来要好得多(我真的很喜欢说出名字)。

最终提示

嗯,在某些情况下,丑陋的代码表现更好:-)

拥有干净且可读的代码总是好的。 CTE 将帮助您解决此问题。 所以试试吧。 如果性能不好,请深入研究执行计划,并尝试找到引擎可能做出错误决定的原因。

在大多数情况下,试图通过诸如FORCE ORDER之类的提示来智胜引擎是一个坏主意(但可以提供帮助)

UPDATE

我被要求具体指出优点和缺点:

嗯,从技术上来说,没有真正的优点或缺点。 抛开递归 CTE 不谈,没有 CTE 就没有解决不了的问题。

优点
主要优点可读性和可维护性
有时一个 CTE 可以节省数百行代码。 可以只使用名称作为变量,而不是重复巨大的子查询。 对子查询的更正只需在一处即可解决。
CTE 可以提供临时查询服务,让您的生活更轻松。

缺点
一个可能的缺点是,即使对于经验丰富的开发人员来说,也很容易将 CTE 误认为临时表,假设可见的步骤顺序与实际的执行顺序相同,并陷入困境意外结果甚至错误
而且 - 当然:-) - 当您在另一个语句后编写 CTE 而没有分隔 ; 时,您会看到奇怪的 错误语法 错误。 这就是为什么很多人倾向于使用 ;WITH

A CTE is not much more than syntactic sugar.
It enhances the readability and allows to avoid repetition.

Just think of it as a placeholder for the actual statement specified in the WITH()-clause. The engine will replace any occurance of the CTE's name in your query with this statement (quite similar to a view). This is the meaning of inline.

Compared to a previously filled table (delared or created) You'll find advantages:

  • useable in ad-hoc-queries (functions, views)
  • no unexpected side effects (most narrow scope)

...and disadvantages:

  • You cannot use the CTE's result in different statements
  • You cannot use indexes, statistics to optimize your CTE's set (although it will implicitly use existing indexes and statistics of the targeted objects - if appropriate).

In terms of performance a persisted set (declared or created table) can be (much!) better in some cases, but it forces you into procedural code. You will have to race your horses to find out which is better...

Example: Various approaches to do the same

The following simple (rather useless) example describes a set of user tables together with their columns. I use various different approaches to tell SQL-Server what I want:

Try this with "include actual execution plan"

USE master; --in my case the master database has just 5 "user tables", you can use any other DB of course
GO
    
--simple join, first the small set joining to the large set
SELECT o.name AS TableName
      ,c.name AS ColumnName
FROM sys.objects o
INNER JOIN sys.columns c ON c.object_id=o.object_id 
WHERE o.type='U';
GO

--simple join "the other way round" with the filter as part of the ON-clause
SELECT o.name AS TableName
      ,c.name AS ColumnName
FROM sys.columns c
INNER JOIN sys.objects o ON c.object_id=o.object_id AND o.type='U';
GO

--join from the large set with a sub-query to the small set
SELECT o.name AS TableName
      ,c.name AS ColumnName
FROM sys.columns c
INNER JOIN (
    SELECT o.* 
    FROM sys.objects o
    WHERE o.type='U' --user tables 
) o ON c.object_id=o.object_id;
GO

--join for large to small with a row-wise APPLY
SELECT o.name AS TableName
      ,c.name AS ColumnName
FROM sys.columns c
CROSS APPLY (
    SELECT o.* 
    FROM sys.objects o
    WHERE o.type='U' --user tables 
    AND o.object_id=c.object_id
) o;
GO

--use a CTE to "pre-filter" the small set
WITH cte AS
(
    SELECT o.* 
    FROM sys.objects o
    WHERE o.type='U' --user tables 
)
SELECT cte.name AS TableName
      ,c.name AS ColumnName
FROM sys.columns c
INNER JOIN cte ON c.object_id=cte.object_id;
GO

Now look at the result and at the execution plans:

  • All queries return the same result.
  • All queries produce the same execution plan

Important hint: This might differ on your machine!

Why is this?

T-SQL is a declarative language. Your statement is a description of WHAT you want to retrieve. It is not your job to tell the engine HOW this is done.

SQL-Server's extremely smart engine will find the best way to get the set you asked for. In the case above all result descriptions point to the same goal. The engine can deduce this from various statements and finds the same plan for all of them.

Well, is it just a matter of taste?

In a way...

There are some important things to keep in mind:

  • There is no reason for the engine to compute the CTE's result before the rest (although the statement might look so). Therefore it is wrong to describe a CTE as something like a temp table...
  • In other words: The visible order of your statement does not predict the actual order of execution!
  • The smart engine will reach its limits with complexity and nest level. Imagine various VIEWs, all using CTEs and calling each-other...
  • There are cases where the engine really f**s up. I remember a case where a CTE did not much more than a TRY_CAST. The idea was to ensure valid values in the query below. But the engine thought "Oh, just a CAST, not expensiv!" and included the acutal CAST to the execution plan on a higher position. I remember another case where the engine performed an expensive operation against millions of rows (unnecessarily, the final result was filtered to a tiny set), just because the actual order of execution was not as expected.

Okay... So when should I use a CTE?

The following points are good reasons to use a CTE:

  • A CTE can help you to avoid repeated sub queries.
  • A CTE can be used multiple times within your statement, e.g. within a JOIN with a dynamic behavior depending on the actual row-count.
  • You can use multiple CTEs within one statement and you can use the result of one CTE within a later CTE.
  • There are recursive (or better iterative) CTEs.
  • Sometimes I used single-row-CTEs to define / pre-compute variables later used in the query. Things you would do with declared variables in procedural T-SQL. You can use A CROSS JOIN to get them into your query easily.
  • and also very nice: the updatable CTE allows for very easy-to-read statements, same applies for DELETE.
    As above: Nothing one could not do without the CTE, but it is far better to read (I really like speaking names).

Final hints

Well, there are cases, where ugly code performs better :-)

It is always good to have clean and readable code. A CTE will help you with this. So give it a try. If the performance is bad, get into depth, look at the execution plans and try to find a reason where the engine might decide wrong.

In most cases it is a bad idea trying to outsmart the engine with hints such as FORCE ORDER (but in can help)

UPDATE

I was asked to point to advantages and disadvantages specifically:

Uhm, technically there are no real advantages or disadvantages. Disregarding recursive CTEs there's nothing one couldn't solve without a CTE.

Advantages
The main advantage is readability and maintainabilty.
Sometimes a CTE can save hundreds of lines of code. Instead of a repeating a huge sub-query one can use just a name as a variable. Corrections to the sub-query can be solved just in one place.
The CTE can serve in ad-hoc queries and make your life easier.

Disadvantages
One possible disadvantage is that it's very easy, even for experienced developers, to mistake a CTE as a temp table, assume that the visible order of steps will be the same as the acutal order of execution and stumble into unexpected results or even errors.
And - of course :-) - the strange wrong syntax error you'll see when you write a CTE after another statement without a separating ;. That's why many people tend to use ;WITH.

記柔刀 2024-07-12 03:43:37

您确实必须进行性能测试 - 没有是/否的答案。 根据上面 Andy Living 的帖子链接,CTE 只是查询或子查询的简写。

如果您在同一个函数中调用它两次或多次,那么如果您填充一个表变量然后连接到该变量或从中选择,您可能会获得更好的性能。 但是,由于表变量在某处占用空间,并且没有索引/统计信息(表变量上任何声明的主键除外),因此无法说哪个会更快。

他们都有成本和节省,哪一个是最好的方法取决于他们提取的数据以及他们如何处理这些数据。 我遇到过你的情况,并且在测试了各种条件下的速度之后 - 有些函数使用 CTE,而其他函数使用表变量。

You'll really have to performance test - There is no Yes/No answer. As per Andy Living's post above links to, a CTE is just shorthand for a query or subquery.

If you are calling it twice or more in the same function, you might get better performance if you fill a table variable and then join to/select from that. However, as table variables take up space somewhere, and don't have indexes/statistics (With the exception of any declared primary key on the table variable) there's no way of saying which will be faster.

They both have costs and savings, and which is the best way depends on the data they pull in and what they do with it. I've been in your situation, and after testing for speed under various conditions - Some functions used CTEs, and others used table variables.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文