CTE 和临时表哪个性能更高?
CTE
和 临时表
哪个性能更高?
Which are more performant, CTE
or Temporary Tables
?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
CTE
和 临时表
哪个性能更高?
Which are more performant, CTE
or Temporary Tables
?
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(12)
这取决于。
首先
什么是通用表表达式?
(非递归)CTE 的处理方式与其他也可用作 SQL Server 中内联表表达式的构造非常相似。 派生表、视图和内联表值函数。 请注意,虽然 BOL 说 CTE“可以被认为是临时结果集”,但这纯粹是逻辑描述。 通常,它本身并没有具体化。
什么是临时表?
这是存储在 tempdb 数据页上的行的集合。 数据页可以部分或全部驻留在存储器中。 另外,临时表可以被索引并具有列统计信息。
测试数据
示例1
注意上面的计划没有提到CTE1。 相同,
它只是直接访问基表,并被视为与通过将 CTE 物化到中间临时表中的重写
这会产生巨大的反作用。 实现 Would 的 CTE 定义
涉及将大约 8GB 的数据复制到临时表中,然后仍然存在从中进行选择的开销。
示例 2
上面的示例在我的机器上大约需要 4 分钟。
1,000,000 个随机生成的值中只有 15 行与谓词匹配,但昂贵的表扫描发生了 16 次才能找到这些值。
这将是实现中间结果的良好候选。 等效的临时表重写花费了 25 秒。
将部分查询的中间具体化到临时表中有时会很有用,即使只评估一次 - 当它允许利用具体化结果的统计信息重新编译查询的其余部分。 SQL Cat 文章中提供了此方法的示例 何时分解复杂查询。
在某些情况下,SQL Server 将使用假脱机来缓存中间结果(例如 CTE 的结果),并避免必须重新评估该子树。 这在(迁移的)连接项 提供强制 CTE 或派生表的中间具体化的提示。 然而,没有对此创建统计信息,即使假脱机行数与估计值有很大不同,正在进行的执行计划也不可能动态适应响应(至少在当前版本中)。自适应查询计划可能在未来)。
It depends.
First of all
What is a Common Table Expression?
A (non recursive) CTE is treated very similarly to other constructs that can also be used as inline table expressions in SQL Server. Derived tables, Views, and inline table valued functions. Note that whilst BOL says that a CTE "can be thought of as temporary result set" this is a purely logical description. More often than not it is not materlialized in its own right.
What is a temporary table?
This is a collection of rows stored on data pages in tempdb. The data pages may reside partially or entirely in memory. Additionally the temporary table may be indexed and have column statistics.
Test Data
Example 1
Notice in the plan above there is no mention of CTE1. It just accesses the base tables directly and is treated the same as
Rewriting by materializing the CTE into an intermediate temporary table here would be massively counter productive.
Materializing the CTE definition of
Would involve copying about 8GB of data into a temporary table then there is still the overhead of selecting from it too.
Example 2
The above example takes about 4 minutes on my machine.
Only 15 rows of the 1,000,000 randomly generated values match the predicate but the expensive table scan happens 16 times to locate these.
This would be a good candidate for materializing the intermediate result. The equivalent temp table rewrite took 25 seconds.
Intermediate materialisation of part of a query into a temporary table can sometimes be useful even if it is only evaluated once - when it allows the rest of the query to be recompiled taking advantage of statistics on the materialized result. An example of this approach is in the SQL Cat article When To Break Down Complex Queries.
In some circumstances SQL Server will use a spool to cache an intermediate result, e.g. of a CTE, and avoid having to re-evaluate that sub tree. This is discussed in the (migrated) Connect item Provide a hint to force intermediate materialization of CTEs or derived tables. However no statistics are created on this and even if the number of spooled rows was to be hugely different from estimated is not possible for the in progress execution plan to dynamically adapt in response (at least in current versions. Adaptive Query Plans may become possible in the future).
我想说它们是不同的概念,但并没有太大的不同,不能说“粉笔和奶酪”。
临时表适合重用或对一组数据执行多次处理。
临时表适合重用
CTE 可用于递归或简单地提高可读性。
而且,与视图或内联表值函数一样,也可以将其视为要在主查询中扩展的宏
临时表是另一个具有一些范围规则的表
我已经存储了我同时使用的过程(以及表变量)
I'd say they are different concepts but not too different to say "chalk and cheese".
A temp table is good for re-use or to perform multiple processing passes on a set of data.
A CTE can be used either to recurse or to simply improved readability.
And, like a view or inline table valued function can also be treated like a macro to be expanded in the main query
A temp table is another table with some rules around scope
I have stored procs where I use both (and table variables too)
CTE 有它的用处——当 CTE 中的数据很小并且可读性有很大提高时,就像递归表中的情况一样。 然而,它的性能肯定不比表变量好,当处理非常大的表时,临时表的性能明显优于 CTE。 这是因为您无法在 CTE 上定义索引,并且当您有大量数据需要与另一个表连接时(CTE 就像一个宏)。 如果您要连接多个表,每个表中包含数百万行记录,则 CTE 的性能将比临时表差得多。
CTE has its uses - when data in the CTE is small and there is strong readability improvement as with the case in recursive tables. However, its performance is certainly no better than table variables and when one is dealing with very large tables, temporary tables significantly outperform CTE. This is because you cannot define indices on a CTE and when you have large amount of data that requires joining with another table (CTE is simply like a macro). If you are joining multiple tables with millions of rows of records in each, CTE will perform significantly worse than temporary tables.
临时表始终位于磁盘上 - 因此只要您的 CTE 可以保存在内存中,它很可能会更快(也像表变量一样)。
但话又说回来,如果 CTE(或临时表变量)的数据负载变得太大,它也会存储在磁盘上,所以没有太大的好处。
一般来说,我更喜欢 CTE 而不是临时表,因为它在我使用后就消失了。 我不需要考虑明确地放弃它或任何事情。
所以,最终没有明确的答案,但就我个人而言,我更喜欢 CTE 而不是临时表。
Temp tables are always on disk - so as long as your CTE can be held in memory, it would most likely be faster (like a table variable, too).
But then again, if the data load of your CTE (or temp table variable) gets too big, it'll be stored on disk, too, so there's no big benefit.
In general, I prefer a CTE over a temp table since it's gone after I used it. I don't need to think about dropping it explicitly or anything.
So, no clear answer in the end, but personally, I would prefer CTE over temp tables.
因此,我被分配优化的查询是用 SQL Server 中的两个 CTE 编写的。 花了28秒。
我花了两分钟将它们转换为临时表,查询花了 3 秒,
我在临时表要连接的字段上添加了一个索引,并将其减少到 2 秒
三分钟的工作,现在它的运行速度快了 12 倍消除 CTE。 我个人不会使用 CTE,因为它们也更难调试。
疯狂的是,CTE 都只使用了一次,但仍然在其上添加索引,结果证明速度快了 50%。
So the query I was assigned to optimize was written with two CTEs in SQL server. It was taking 28sec.
I spent two minutes converting them to temp tables and the query took 3 seconds
I added an index to the temp table on the field it was being joined on and got it down to 2 seconds
Three minutes of work and now its running 12x faster all by removing CTE. I personally will not use CTEs ever they are tougher to debug as well.
The crazy thing is the CTEs were both only used once and still putting an index on them proved to be 50% faster.
我已经使用了这两种方法,但在大量复杂的过程中总是发现临时表更适合使用并且更有条理。 CTE 有其用途,但通常用于小数据。
例如,我创建了存储过程,该存储过程在 15 秒内返回大型计算的结果,但将此代码转换为在 CTE 中运行,并发现它运行了超过 8 分钟才能达到相同的结果。
I've used both but in massive complex procedures have always found temp tables better to work with and more methodical. CTEs have their uses but generally with small data.
For example I've created sprocs that come back with results of large calculations in 15 seconds yet convert this code to run in a CTE and have seen it run in excess of 8 minutes to achieve the same results.
CTE 不会占用任何物理空间。 它只是一个我们可以使用连接的结果集。
临时表是临时的。 我们可以像普通表一样创建索引和约束,因为我们需要定义所有变量。
临时表的范围仅在会话内。
前任:
打开两个 SQL 查询窗口
在第一个窗口中运行此查询
然后在第二个窗口中运行以下查询,您可以找到差异。
CTE won't take any physical space. It is just a result set we can use join.
Temp tables are temporary. We can create indexes, constrains as like normal tables for that we need to define all variables.
Temp table's scope only within the session.
EX:
Open two SQL query window
Run this query in first window
then run the below query in second window you can find the difference.
聚会迟到了,但是......
我工作的环境受到高度限制,支持一些供应商产品并提供报告等“增值”服务。 由于政策和合同的限制,我通常不允许拥有单独的表/数据空间和/或创建永久代码的能力[它会变得更好一点,具体取决于应用程序]。
IOW,我通常无法开发存储过程或 UDF 或临时表等。我几乎必须通过我的应用程序界面完成所有操作(Crystal Reports - 添加/链接表,设置来自的 where 子句) w/in CR 等)。 一个小小的优点是 Crystal 允许我使用命令(以及 SQL 表达式)。 一些通过常规添加/链接表功能效率不高的事情可以通过定义 SQL 命令来完成。 我通过这种方式使用 CTE,并“远程”获得了非常好的结果。 CTE 还有助于报告维护,不需要开发代码,交给 DBA 进行编译、加密、传输、安装,然后需要进行多级测试。 我可以通过本地界面进行 CTE。
使用带有 CR 的 CTE 的缺点是,每个报告都是单独的。 必须为每个报告维护每个 CTE。 在我可以做 SP 和 UDF 的地方,我可以开发一些可供多个报告使用的东西,只需要链接到 SP 并传递参数,就像您在处理常规表一样。 CR 并不擅长将参数处理到 SQL 命令中,因此 CR/CTE 方面可能会有所欠缺。 在这些情况下,我通常尝试定义 CTE 以返回足够的数据(但不是所有数据),然后使用 CR 中的记录选择功能对其进行切片和切块。
所以...我投票支持 CTE(直到我获得数据空间)。
Late to the party, but...
The environment I work in is highly constrained, supporting some vendor products and providing "value-added" services like reporting. Due to policy and contract limitations, I am not usually allowed the luxury of separate table/data space and/or the ability to create permanent code [it gets a little better, depending upon the application].
IOW, I can't usually develop a stored procedure or UDFs or temp tables, etc. I pretty much have to do everything through MY application interface (Crystal Reports - add/link tables, set where clauses from w/in CR, etc.). One SMALL saving grace is that Crystal allows me to use COMMANDS (as well as SQL Expressions). Some things that aren't efficient through the regular add/link tables capability can be done by defining a SQL Command. I use CTEs through that and have gotten very good results "remotely". CTEs also help w/ report maintenance, not requiring that code be developed, handed to a DBA to compile, encrypt, transfer, install, and then require multiple-level testing. I can do CTEs through the local interface.
The down side of using CTEs w/ CR is, each report is separate. Each CTE must be maintained for each report. Where I can do SPs and UDFs, I can develop something that can be used by multiple reports, requiring only linking to the SP and passing parameters as if you were working on a regular table. CR is not really good at handling parameters into SQL Commands, so that aspect of the CR/CTE aspect can be lacking. In those cases, I usually try to define the CTE to return enough data (but not ALL data), and then use the record selection capabilities in CR to slice and dice that.
So... my vote is for CTEs (until I get my data space).
我发现 CTE 具有出色性能的一个用途是,我需要将一个相对复杂的查询连接到几个表,每个表有几百万行。
我使用 CTE 首先选择基于索引列的子集,首先将这些表削减为每个表数千个相关行,然后将 CTE 连接到我的主查询。 这极大地减少了我的查询的运行时间。
虽然 CTE 的结果没有缓存,并且表变量可能是更好的选择,但我真的只是想尝试一下并发现它们适合上述场景。
One use where I found CTE's excelled performance wise was where I needed to join a relatively complex Query on to a few tables which had a few million rows each.
I used the CTE to first select the subset based of the indexed columns to first cut these tables down to a few thousand relevant rows each and then joined the CTE to my main query. This exponentially reduced the runtime of my query.
Whilst results for the CTE are not cached and table variables might have been a better choice I really just wanted to try them out and found the fit the above scenario.
我刚刚测试了这一点 - CTE 和非 CTE(其中为每个联合实例键入查询)都花费了约 31 秒。 不过,CTE 使代码更具可读性 - 将其从 241 行减少到 130 行,这非常好。 另一方面,临时表将其减少到 132 行,并且运行时间为 5 秒。 不是开玩笑。 所有这些测试都被缓存 - 查询之前都运行了多次。
I just tested this- both CTE and non-CTE (where the query was typed out for every union instance) both took ~31 seconds. CTE made the code much more readable though- cut it down from 241 to 130 lines which is very nice. Temp table on the other hand cut it down to 132 lines, and took FIVE SECONDS to run. No joke. all of this testing was cached- the queries were all run multiple times before.
这是一个真正开放式的问题,这完全取决于它的使用方式和临时表的类型(表变量或传统表)。
传统的临时表将数据存储在临时数据库中,这确实会降低临时表的速度; 然而表变量却没有。
This is a really open ended question, and it all depends on how its being used and the type of temp table (Table variable or traditional table).
A traditional temp table stores the data in the temp DB, which does slow down the temp tables; however table variables do not.
根据我在 SQL Server 中的经验,我发现 CTE 优于临时表的场景之一是
我需要在存储过程中使用来自复杂查询的数据集(~100000)。
临时表在我的过程所在的 SQL 上造成了开销
执行缓慢(因为临时表是真正的物化表,
存在于 tempdb 中并在我当前过程的生命周期内持续存在)
另一方面,对于 CTE,CTE 仅持续到以下情况
查询已运行。 因此,CTE 是一种方便的内存结构,具有有限的
范围。 CTE 默认情况下不使用 tempdb。
在这种情况下,CTE 确实可以帮助简化代码并超越临时表。
我使用了 2 个 CTE,例如
From my experience in SQL Server,I found one of the scenarios where CTE outperformed Temp table
I needed to use a DataSet(~100000) from a complex Query just ONCE in my stored Procedure.
Temp table was causing an overhead on SQL where my Procedure was
performing slowly(as Temp Tables are real materialized tables that
exist in tempdb and Persist for the life of my current procedure)
On the other hand, with CTE, CTE Persist only until the following
query is run. So, CTE is a handy in-memory structure with limited
Scope. CTEs don't use tempdb by default.
This is one scenario where CTEs can really help simplify your code and Outperform Temp Table.
I had Used 2 CTEs, something like