FAST_FORWARD 游标何时会有工作表(这是要避免的事情)?
背景
我在尝试运行总查询时注意到,有时估计的计划只显示“Fetch Query”
和实际计划 从聚集索引扫描中重复获取
显示在其他情况下(例如,添加 TOP
时) 到查询)估计计划显示“填充查询”阶段,该阶段填充工作表
实际计划显示聚集索引扫描以填充工作表,然后对该工作表重复查找。
问题
- SQL Server 在选择一种方法而不是另一种方法时使用什么标准?
- 我是否认为第一种方法(没有额外的工作表填充步骤)更有效?
(额外问题:如果有人可以解释为什么第一个查询中的每次扫描都算作 2 个逻辑读取,这可能也很有启发)
其他信息
我发现的 这篇文章解释了这一点FAST_FORWARD
游标可以使用动态计划或静态计划。本例中的第一个查询似乎使用动态计划,第二个查询使用静态计划。
我还发现,如果我尝试,
SET @C2 = CURSOR DYNAMIC TYPE_WARNING FOR SELECT TOP ...
游标会隐式转换为 keyset
游标,因此很明显,动态游标不支持 TOP
构造,也许对于鲁本回答中的原因 - 仍在寻找对此的明确解释。
不过,我也了解到动态游标往往比静态游标慢(来源 1, 来源 2)这似乎令人惊讶对我来说,静态变量必须读取源数据,复制它,然后读取副本,而不仅仅是读取源数据。 我之前引用的文章提到动态游标使用标记
。谁能解释一下这些是什么?它只是一个 RID 或 CI 密钥,还是其他什么?
脚本
SET STATISTICS IO OFF
CREATE TABLE #T ( ord INT IDENTITY PRIMARY KEY, total INT, Filler char(8000))
INSERT INTO #T (total) VALUES (37),(80),(55),(31),(53)
DECLARE @running_total INT,
@ord INT,
@total INT
SET @running_total = 0
SET STATISTICS IO ON
DECLARE @C1 AS CURSOR;
SET @C1 = CURSOR FAST_FORWARD FOR SELECT ord, total FROM #T ORDER BY ord;
OPEN @C1;
PRINT 'Initial FETCH C1'
FETCH NEXT FROM @C1 INTO @ord, @total ;
WHILE @@FETCH_STATUS = 0
BEGIN
SET @running_total = @running_total + @total
PRINT 'FETCH C1'
FETCH NEXT FROM @C1 INTO @ord, @total ;
END
SET @running_total = 0
SET STATISTICS IO ON
DECLARE @C2 AS CURSOR;
SET @C2 = CURSOR FAST_FORWARD FOR SELECT TOP 5 ord, total FROM #T ORDER BY ord;
OPEN @C2;
PRINT 'Initial FETCH C2'
FETCH NEXT FROM @C2 INTO @ord, @total ;
WHILE @@FETCH_STATUS = 0
BEGIN
SET @running_total = @running_total + @total
PRINT 'FETCH C2'
FETCH NEXT FROM @C2 INTO @ord, @total ;
END
PRINT 'End C2'
DROP TABLE #T
Background
I noticed whilst experimenting with running total queries that sometimes the estimated plan just shows a "Fetch Query"
and the actual plan shows repeated Fetches from the Clustered Index Scan
on other occasions (e.g when adding a TOP
to the query) the estimated plan shows a "Population Query" stage that populates a work table
With the actual plan showing a clustered index scan to populate the work table then repeated seeks against that work table.
Question
- What criteria does SQL Server use in choosing one approach over the other?
- Would I be right in thinking that the first method (without the additional work table population step) is more efficient?
(Bonus question: If anyone could explain why each scan in the first query counts as 2 logical reads that might be quite enlightening too)
Additional Information
I have found this article here which explains that FAST_FORWARD
cursors can either use a dynamic plan or a static plan. The first query in this case appears to be using a dynamic plan and the second one a static plan.
I've also found that if I try
SET @C2 = CURSOR DYNAMIC TYPE_WARNING FOR SELECT TOP ...
The cursor gets implicitly converted to a keyset
cursor so it is clear that the TOP
construct is not supported for dynamic cursors, perhaps for the reasons in Ruben's answer - Still looking for a definitive explanation of this.
However I have also read that dynamic cursors tend to be slower than their static counterparts (source 1, source 2) which seems surprising to me given that the static variety have to read the source data, copy it, then read the copy rather than just read the source data. The article I referenced earlier mentions that dynamic cursors use markers
. Can anyone explain what these are? Is it just a RID or the CI key, or something different?
Script
SET STATISTICS IO OFF
CREATE TABLE #T ( ord INT IDENTITY PRIMARY KEY, total INT, Filler char(8000))
INSERT INTO #T (total) VALUES (37),(80),(55),(31),(53)
DECLARE @running_total INT,
@ord INT,
@total INT
SET @running_total = 0
SET STATISTICS IO ON
DECLARE @C1 AS CURSOR;
SET @C1 = CURSOR FAST_FORWARD FOR SELECT ord, total FROM #T ORDER BY ord;
OPEN @C1;
PRINT 'Initial FETCH C1'
FETCH NEXT FROM @C1 INTO @ord, @total ;
WHILE @@FETCH_STATUS = 0
BEGIN
SET @running_total = @running_total + @total
PRINT 'FETCH C1'
FETCH NEXT FROM @C1 INTO @ord, @total ;
END
SET @running_total = 0
SET STATISTICS IO ON
DECLARE @C2 AS CURSOR;
SET @C2 = CURSOR FAST_FORWARD FOR SELECT TOP 5 ord, total FROM #T ORDER BY ord;
OPEN @C2;
PRINT 'Initial FETCH C2'
FETCH NEXT FROM @C2 INTO @ord, @total ;
WHILE @@FETCH_STATUS = 0
BEGIN
SET @running_total = @running_total + @total
PRINT 'FETCH C2'
FETCH NEXT FROM @C2 INTO @ord, @total ;
END
PRINT 'End C2'
DROP TABLE #T
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
只是预感,但通常 TOP-ORDER BY 要求 SQL Server 以某种方式缓冲结果(索引扫描的结果或临时结构中的整个结果,或两者之间的任何结果)。
有人可能会争辩说,对于游标来说,即使按主键排序(如您的示例中所示),这也是必要的,因为当相应的 SELECT 确实返回 5 行(或更糟糕的是:游标返回超过 5 行)。
理论上,当游标的索引扫描范围已经确定之后,表上有删除或插入操作,并且插入/删除操作在该范围内时,理论上可能会发生这种奇怪的情况。索引扫描的范围,但您尚未完成获取。为了防止这种情况发生,他们可能会在安全方面犯错误。 (而且他们只是没有针对 #temp 表进行优化。)
但有一个问题:SQL Server 是否允许在没有
ORDER BY
子句的情况下使用FETCH FROM SELECT TOP n
? (这里还没有运行 SQL Server 实例。)了解这会导致什么计划可能会很有趣。Just a hunch, but normally a TOP-ORDER BY requires SQL Server to buffer the result in some way (either the index scan's result or indeed the entire result in a temp structure, or anything in between).
One could argue that for cursors this is also necessary even when ordering by the primary key (as in your example), as you cannot allow a TOP 5 cursor to unexpectedly return less than 5 rows when the corresponding SELECT does return exactly 5 rows (or worse: the cursor returns more than 5 rows).
This weird situation could theoretically happen when there are deletes or inserts on the table after the index scan's range has already been determined for the cursor, and the inserts/deletes fall within the index scan's range, but you're not yet done fetching. To prevent this from happening, they might err on the safe side here. (And they just didn't optimize for #temp tables.)
A question though: does SQL Server allow a
FETCH FROM SELECT TOP n
without anORDER BY
clause? (Haven't got a SQL Server instance running here.) Might be interesting to know what plan that causes.SQL Server 在选择一种方法而不是另一种方法时使用什么标准?
这主要是基于成本的决策。引用您链接到的文章,“在动态计划看起来很有希望的情况下,可能会启发式地跳过成本比较。这主要发生在极其便宜的查询中,尽管细节很深奥。”
我是否认为第一种方法(没有额外的工作表填充步骤)更有效?
这取决于。动态和静态游标计划具有不同的优点和缺点。如果最终会触及所有行,则静态计划可能会执行得更好。稍后会详细介绍这一点。
很明显,动态游标不支持
TOP
构造这是事实。动态游标计划中的所有迭代器必须能够保存和恢复状态、向前和向后扫描、为每个输出行处理一个输入行,并且是非阻塞的。一般来说,Top 并不能满足所有这些要求;类
CQScanTopNew
未实现必要的Set/Get/Goto/Marker()
和ReverseDirection()
方法(等等)。我还了解到动态游标往往比静态游标慢。
对于 Transact-SQL 游标来说通常是这样,因为大多数或全部游标集都会被触及。保存和恢复动态查询计划的状态会产生相关成本。在每次调用时处理单行并且最终触及所有行的情况下,此保存/恢复开销最大化。
静态游标具有制作集合副本的开销(这可能是大型集合的主导因素),但每行的检索成本非常小。键集的每行检索开销比静态键集更高,因为它们必须外连接回源表才能检索非键列。
当访问集合中相对较小的部分和/或检索不是一次一行时,动态游标是最佳选择。这是许多常见游标场景中的典型访问模式,只是不是博客文章倾向于测试的模式:)
如果有人可以解释为什么第一个查询中的每次扫描算作 2 个逻辑读取,这可能也很有启发性
这取决于扫描保存状态的方式以及读取计数的方式。
我之前引用的文章提到动态游标使用标记。谁能解释一下这些是什么?它只是一个 RID 或 CI 密钥,还是其他东西?
动态游标计划中的每个迭代器都存在标记,而不仅仅是访问方法。 “标记”是在停止点重新启动计划迭代器所需的所有状态信息。对于访问方法,RID 或索引键(如果需要的话还可以使用唯一标识符)是其中的重要组成部分,但无论如何都不是全部。
What criteria does SQL Server use in choosing one approach over the other?
It is primarily a cost-based decision. Quoting from the article you linked to, "In situations where the dynamic plan looks promising, the cost comparison may be heuristically skipped. This occurs mainly for extremely cheap queries, though the details are esoteric."
Would I be right in thinking that the first method (without the additional work table population step) is more efficient?
It depends. Dynamic and static cursor plans have different strengths and weaknesses. If all rows will eventually be touched, the static plan is likely to perform better. More on this in a moment.
It is clear that the
TOP
construct is not supported for dynamic cursorsThis is true. All iterators in a dynamic cursor plan must be able to save and restore state, scan forward and backward, process one input row for each output row, and be non-blocking. Top, in general, does not satisfy all these requirements; the class
CQScanTopNew
does not implement the necessarySet/Get/Goto/Marker()
, andReverseDirection()
methods (among others).I have also read that dynamic cursors tend to be slower than their static counterparts.
This is often true, for Transact-SQL cursors, where most or all of the cursor set is touched. There is a cost associated with saving and restoring the state of a dynamic query plan. Where a single row is processed on each call, and all rows eventually touched, this save/restore overhead is maximized.
Static cursors have the overhead of making a copy of the set (which may be the dominant factor for a large set), but the per-row cost of retrieval is quite small. Keysets have a higher per-row retrieval overhead than static because they must outer join back to the source tables to retrieve non-key columns.
Dynamic cursors are optimal when a relatively small fraction of the set is accessed, and/or retrieval is not row-at-a-time. This is a typical access pattern in many common cursor scenarios, just not the ones blog posts tend to test :)
If anyone could explain why each scan in the first query counts as 2 logical reads that might be quite enlightening too
This is down to the way state is saved for the scan, and the way reads are counted.
The article I referenced earlier mentions that dynamic cursors use markers. Can anyone explain what these are? Is it just a RID or the CI key, or something different?
Markers exist for each iterator in a dynamic cursor plan, not just access methods. The 'marker' is all the state information necessary to restart the plan iterator at the point it left off. For an access method, an RID or index key (with uniquifier if necessary) is a big part of this, but not the whole story by any means.