子查询或 leftjoin 与 group by 哪个更快？

发布于 2024-12-04 03:14:48 字数 644 浏览 7 评论 0原文

查询执行计划我必须在我的应用程序中显示运行总计和总计列...所以我使用了以下查询为了找到运行总计......我发现两者都按照我的需要工作。在一个中，我使用了带有 group by 的左连接，在另一个中，我使用了子查询。

现在我的问题是，当我的数据每天增长数千次时，哪一种更快，如果数据限制在 1000 或 2000 行，那么哪一种更好......以及任何其他方法比这两种更快？ ??

declare @tmp table(ind int identity(1,1),col1 int)
insert into @tmp
select 2
union
select 4
union
select 7
union 

select 5
union
select 8
union 
select 10



SELECT t1.col1,sum( t2.col1)
FROM @tmp AS t1 LEFT JOIN @tmp t2 ON t1.ind>=t2.ind
group by t1.ind,t1.col1


select t1.col1,(select sum(col1) from  @tmp as t2 where t2.ind<=t1.ind)
from @tmp as t1

原文

Query Execution Plan i have to show running total with the total column in my application ... so i have used the following queries for finding the running total... and i find that both are working as per my need . in one i used the left join with group by and in another one i used the sub query .

and now my question is which one is faster when my data grow in thousands daily and if data will be in limit of 1000 or 2000 rows then which one is better ... and any other method by which is more faster then these two ????

declare @tmp table(ind int identity(1,1),col1 int)
insert into @tmp
select 2
union
select 4
union
select 7
union 

select 5
union
select 8
union 
select 10



SELECT t1.col1,sum( t2.col1)
FROM @tmp AS t1 LEFT JOIN @tmp t2 ON t1.ind>=t2.ind
group by t1.ind,t1.col1


select t1.col1,(select sum(col1) from  @tmp as t2 where t2.ind<=t1.ind)
from @tmp as t1

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

两相知 2024-12-11 03:14:48

关于在 SQL Server 中计算运行总计的重要资源是由 Itzik Ben Gan 提交的此文档，该文档已提交至SQL Server 团队希望将 OVER 子句从最初的 SQL Server 2005 实现中进一步扩展。在其中，他展示了一旦进入数万行游标如何执行基于集合的解决方案。 SQL Server 2012 确实扩展了 OVER 子句，使此类查询变得更加容易。

SELECT col1,
       SUM(col1) OVER (ORDER BY ind ROWS UNBOUNDED PRECEDING)
FROM   @tmp

然而，由于您使用的是 SQL Server 2005，因此您无法使用此功能。

亚当·马哈尼克此处展示如何使用 CLR 来提高标准 TSQL 游标的性能。

对于此表定义，

CREATE TABLE RunningTotals
(
ind int identity(1,1) primary key,
col1 int
)

我在数据库中创建了包含 2,000 行和 10,000 行的表，其中 ALLOW_SNAPSHOT_ISOLATION ON 和一个关闭此设置的表（这样做的原因是因为我的初始结果位于设置为 on 的数据库中）这导致了结果令人费解的一面）。

所有表的聚集索引只有 1 个根页。每个叶子页的数量如下所示。

+-------------------------------+-----------+------------+
|                               | 2,000 row | 10,000 row |
+-------------------------------+-----------+------------+
| ALLOW_SNAPSHOT_ISOLATION OFF  |         5 |         22 |
| ALLOW_SNAPSHOT_ISOLATION ON   |         8 |         39 |
+-------------------------------+-----------+------------+

我测试了以下情况（链接显示执行计划）

Left Join 和 Group By
相关子查询 2000 行计划,10000 行计划
来自 Mikael（更新）答案的 CTE
< a href="https://i.sstatic.net/dv2XE.jpg" rel="nofollow">CTE 下面

包含附加 CTE 选项的原因是为了提供一个 CTE 解决方案，该解决方案将如果 ind 列不能保证顺序，仍然可以工作。

SET STATISTICS IO ON;
SET STATISTICS TIME ON;
DECLARE @col1 int, @sumcol1 bigint;

WITH    RecursiveCTE
AS      (
        SELECT TOP 1 ind, col1, CAST(col1 AS BIGINT) AS Total
        FROM RunningTotals
        ORDER BY ind
        UNION   ALL
        SELECT  R.ind, R.col1, R.Total
        FROM    (
                SELECT  T.*,
                        T.col1 + Total AS Total,
                        rn = ROW_NUMBER() OVER (ORDER BY T.ind)
                FROM    RunningTotals T
                JOIN    RecursiveCTE R
                        ON  R.ind < T.ind
                ) R
        WHERE   R.rn = 1
        )
SELECT  @col1 =col1, @sumcol1=Total
FROM    RecursiveCTE
OPTION  (MAXRECURSION 0);

所有查询都添加了 CAST(col1 AS BIGINT) 以避免运行时出现溢出错误。此外，对于所有这些，我将结果分配给上述变量，以消除发送回结果所花费的时间。

结果

+------------------+----------+--------+------------+---------------+------------+---------------+-------+---------+
|                  |          |        |          Base Table        |         Work Table         |     Time        |
+------------------+----------+--------+------------+---------------+------------+---------------+-------+---------+
|                  | Snapshot | Rows   | Scan count | logical reads | Scan count | logical reads | cpu   | elapsed |
| Group By         | On       | 2,000  | 2001       | 12709         |            |               | 1469  | 1250    |
|                  | On       | 10,000 | 10001      | 216678        |            |               | 30906 | 30963   |
|                  | Off      | 2,000  | 2001       | 9251          |            |               | 1140  | 1160    |
|                  | Off      | 10,000 | 10001      | 130089        |            |               | 29906 | 28306   |
+------------------+----------+--------+------------+---------------+------------+---------------+-------+---------+
| Sub Query        | On       | 2,000  | 2001       | 12709         |            |               | 844   | 823     |
|                  | On       | 10,000 | 2          | 82            | 10000      | 165025        | 24672 | 24535   |
|                  | Off      | 2,000  | 2001       | 9251          |            |               | 766   | 999     |
|                  | Off      | 10,000 | 2          | 48            | 10000      | 165025        | 25188 | 23880   |
+------------------+----------+--------+------------+---------------+------------+---------------+-------+---------+
| CTE No Gaps      | On       | 2,000  | 0          | 4002          | 2          | 12001         | 78    | 101     |
|                  | On       | 10,000 | 0          | 20002         | 2          | 60001         | 344   | 342     |
|                  | Off      | 2,000  | 0          | 4002          | 2          | 12001         | 62    | 253     |
|                  | Off      | 10,000 | 0          | 20002         | 2          | 60001         | 281   | 326     |
+------------------+----------+--------+------------+---------------+------------+---------------+-------+---------+
| CTE Alllows Gaps | On       | 2,000  | 2001       | 4009          | 2          | 12001         | 47    | 75      |
|                  | On       | 10,000 | 10001      | 20040         | 2          | 60001         | 312   | 413     |
|                  | Off      | 2,000  | 2001       | 4006          | 2          | 12001         | 94    | 90      |
|                  | Off      | 10,000 | 10001      | 20023         | 2          | 60001         | 313   | 349     |
+------------------+----------+--------+------------+---------------+------------+---------------+-------+---------+

相关子查询和 GROUP BY 版本均使用由 RunningTotals 表 (T1) 上的聚集索引扫描驱动的“三角”嵌套循环连接），并且对于该扫描返回的每一行，在 T2.ind<=T1.ind 上查找回表 (T2) 自连接。

这意味着相同的行会被重复处理。当处理 T1.ind=1000 行时，自连接会检索并汇总具有 ind <= 1000 的所有行，然后对于下一行，其中 T1 .ind=1001 再次检索相同的 1000 行，并与另外一行一起求和，依此类推。

对于 2,000 行的表，此类操作的总数为 2,001,000，对于 10k 行，此类操作的总数为 50,005,000 或更一般( n² + n) / 2 显然呈指数增长。

在 2,000 行的情况下，GROUP BY 和子查询版本之间的主要区别在于，前者在连接后具有流聚合，因此有三列输入其中 (T1.ind< /code>、T2.col1、T2.col1）和 T1.ind 的 GROUP BY 属性，而后者是作为标量聚合计算，在连接之前使用流聚合，仅将 T2.col1 输入其中，并且根本没有设置 GROUP BY 属性。可以看出，这种更简单的安排在减少 CPU 时间方面具有显着的好处。

对于 10,000 行的情况，子查询计划还有一个额外的差异。它添加了一个eager spool，它复制所有ind,cast (col1 as bigint) 值存入 tempdb。在启用快照隔离的情况下，这比聚集索引结构更紧凑，最终效果是将读取次数减少约 25%（因为基表为版本控制信息保留了相当多的空白空间），当此选项关闭时，其结果会不太紧凑（可能是由于 bigint 与 int 的差异），并且会产生更多的读取结果。这减少了子查询和 group by 版本之间的差距，但子查询仍然获胜。

然而，明显的赢家是递归 CTE。对于“无间隙”版本，从基表的逻辑读取现在为 2 x (n + 1)，反映了 n 索引查找 2 级索引以检索所有这些行加上末尾的附加行，该行不返回任何内容并终止递归。然而，这仍然意味着需要 20,002 次读取来处理 22 页表！

递归 CTE 版本的逻辑工作表读取非常高。似乎每个源行有 6 次工作表读取。这些来自存储前一行的输出的索引假脱机，然后在下一次迭代中再次读取（Umachandar Jayachandran对此进行了很好的解释此处）。尽管数量很高，但这仍然是表现最好的。

A great resource on calculating running totals in SQL Server is this document by Itzik Ben Gan that was submitted to the SQL Server Team as part of his campaign to have the OVER clause extended further from its initial SQL Server 2005 implementation. In it he shows how once you get into tens of thousands of rows cursors out perform set based solutions. SQL Server 2012 did indeed extend the OVER clause making this sort of query much easier.

SELECT col1,
       SUM(col1) OVER (ORDER BY ind ROWS UNBOUNDED PRECEDING)
FROM   @tmp

As you are on SQL Server 2005 however this is not available to you.

Adam Machanic shows here how the CLR can be used to improve on the performance of standard TSQL cursors.

For this table definition

CREATE TABLE RunningTotals
(
ind int identity(1,1) primary key,
col1 int
)

I create tables with both 2,000 and 10,000 rows in a database with ALLOW_SNAPSHOT_ISOLATION ON and one with this setting off (The reason for this is because my initial results were in a DB with the setting on that led to a puzzling aspect of the results).

The clustered indexes for all tables just had 1 root page. The number of leaf pages for each is shown below.

+-------------------------------+-----------+------------+
|                               | 2,000 row | 10,000 row |
+-------------------------------+-----------+------------+
| ALLOW_SNAPSHOT_ISOLATION OFF  |         5 |         22 |
| ALLOW_SNAPSHOT_ISOLATION ON   |         8 |         39 |
+-------------------------------+-----------+------------+

I tested the following cases (Links show execution plans)

Left Join and Group By
Correlated subquery 2000 row plan,10000 row plan
CTE from Mikael's (updated) answer
CTE below

The reason for inclusion of the additional CTE option was in order to provide a CTE solution that would still work if the ind column was not guaranteed sequential.

SET STATISTICS IO ON;
SET STATISTICS TIME ON;
DECLARE @col1 int, @sumcol1 bigint;

WITH    RecursiveCTE
AS      (
        SELECT TOP 1 ind, col1, CAST(col1 AS BIGINT) AS Total
        FROM RunningTotals
        ORDER BY ind
        UNION   ALL
        SELECT  R.ind, R.col1, R.Total
        FROM    (
                SELECT  T.*,
                        T.col1 + Total AS Total,
                        rn = ROW_NUMBER() OVER (ORDER BY T.ind)
                FROM    RunningTotals T
                JOIN    RecursiveCTE R
                        ON  R.ind < T.ind
                ) R
        WHERE   R.rn = 1
        )
SELECT  @col1 =col1, @sumcol1=Total
FROM    RecursiveCTE
OPTION  (MAXRECURSION 0);

All of the queries had a CAST(col1 AS BIGINT) added in order to avoid overflow errors at runtime. Additionally for all of them I assigned the results to variables as above in order to eliminate time spent sending back results from consideration.

Results

+------------------+----------+--------+------------+---------------+------------+---------------+-------+---------+
|                  |          |        |          Base Table        |         Work Table         |     Time        |
+------------------+----------+--------+------------+---------------+------------+---------------+-------+---------+
|                  | Snapshot | Rows   | Scan count | logical reads | Scan count | logical reads | cpu   | elapsed |
| Group By         | On       | 2,000  | 2001       | 12709         |            |               | 1469  | 1250    |
|                  | On       | 10,000 | 10001      | 216678        |            |               | 30906 | 30963   |
|                  | Off      | 2,000  | 2001       | 9251          |            |               | 1140  | 1160    |
|                  | Off      | 10,000 | 10001      | 130089        |            |               | 29906 | 28306   |
+------------------+----------+--------+------------+---------------+------------+---------------+-------+---------+
| Sub Query        | On       | 2,000  | 2001       | 12709         |            |               | 844   | 823     |
|                  | On       | 10,000 | 2          | 82            | 10000      | 165025        | 24672 | 24535   |
|                  | Off      | 2,000  | 2001       | 9251          |            |               | 766   | 999     |
|                  | Off      | 10,000 | 2          | 48            | 10000      | 165025        | 25188 | 23880   |
+------------------+----------+--------+------------+---------------+------------+---------------+-------+---------+
| CTE No Gaps      | On       | 2,000  | 0          | 4002          | 2          | 12001         | 78    | 101     |
|                  | On       | 10,000 | 0          | 20002         | 2          | 60001         | 344   | 342     |
|                  | Off      | 2,000  | 0          | 4002          | 2          | 12001         | 62    | 253     |
|                  | Off      | 10,000 | 0          | 20002         | 2          | 60001         | 281   | 326     |
+------------------+----------+--------+------------+---------------+------------+---------------+-------+---------+
| CTE Alllows Gaps | On       | 2,000  | 2001       | 4009          | 2          | 12001         | 47    | 75      |
|                  | On       | 10,000 | 10001      | 20040         | 2          | 60001         | 312   | 413     |
|                  | Off      | 2,000  | 2001       | 4006          | 2          | 12001         | 94    | 90      |
|                  | Off      | 10,000 | 10001      | 20023         | 2          | 60001         | 313   | 349     |
+------------------+----------+--------+------------+---------------+------------+---------------+-------+---------+

Both the correlated subquery and the GROUP BY version use "triangular" nested loop joins driven by a clustered index scan on the RunningTotals table (T1) and, for each row returned by that scan, seeking back into the table (T2) self joining on T2.ind<=T1.ind.

This means that the same rows get processed repeatedly. When the T1.ind=1000 row is processed the self join retrieves and sums all rows with an ind <= 1000, then for the next row where T1.ind=1001 the same 1000 rows are retrieved again and summed along with one additional row and so on.

The total number of such operations for a 2,000 row table is 2,001,000, for 10k rows 50,005,000 or more generally (n² + n) / 2 which clearly grows exponentially.

In the 2,000 row case the main difference between the GROUP BY and the subquery versions is that the former has the stream aggregate after the join and so has three columns feeding into it (T1.ind, T2.col1, T2.col1) and a GROUP BY property of T1.ind whereas the latter is calculated as a scalar aggregate, with the stream aggregate before the join, only has T2.col1 feeding into it and has no GROUP BY property set at all. This simpler arrangement can be seen to have a measurable benefit in terms of reduced CPU time.

For the 10,000 row case there is an additional difference in the sub query plan. It adds an eager spool which copies all the ind,cast(col1 as bigint) values into tempdb. In the case that snapshot isolation is on this works out more compact than the clustered index structure and the net effect is to reduce the number of reads by about 25% (as the base table preserves quite a lot of empty space for versioning info), when this option is off it works out less compact (presumably due to the bigint vs int difference) and more reads result. This reduces the gap between the sub query and group by versions but the sub query still wins.

The clear winner however was the Recursive CTE. For the "no gaps" version logical reads from the base table are now 2 x (n + 1) reflecting the n index seeks into the 2 level index to retrieve all of the rows plus the additional one at the end that returns nothing and terminates the recursion. That still meant 20,002 reads to process a 22 page table however!

Logical work table reads for the recursive CTE version are very high. It seems to work out at 6 worktable reads per source row. These come from the index spool that stores the output of the previous row then is read from again in the next iteration (good explanation of this by Umachandar Jayachandran here). Despite the high number this is still the best performer.

回复收藏 0 原文

别在捏我脸啦 2024-12-11 03:14:48

我想你会发现递归 CTE 更快一些。

;with C as
(
  select t.ind,
         t.col1,
         t.col1 as Total
  from @tmp as t
  where t.ind = 1
  union all
  select t.ind,
         t.col1,
         C.Total + t.col1 as Total
  from @tmp as t
    inner join C
      on C.ind + 1 = t.ind
)
select C.col1,
       C.Total
from C

任何其他更快的方法

的，有。如果您正在寻找出色的性能，您应该只在简单的选择中提取数据，并在进行演示时在客户端上进行运行总计计算。

I think you will find the recursive CTE a bit faster.

;with C as
(
  select t.ind,
         t.col1,
         t.col1 as Total
  from @tmp as t
  where t.ind = 1
  union all
  select t.ind,
         t.col1,
         C.Total + t.col1 as Total
  from @tmp as t
    inner join C
      on C.ind + 1 = t.ind
)
select C.col1,
       C.Total
from C

any other method by which is more faster

Yes there is. If you are looking for outstanding performance you should just pull your data in a simple select and do the running total calculation on the client when you do the presentation.

回复收藏 0 原文