在 SQL Server 中计算运行总计
想象一下下表(称为 TestTable
):
id somedate somevalue
-- -------- ---------
45 01/Jan/09 3
23 08/Jan/09 5
12 02/Feb/09 0
77 14/Feb/09 7
39 20/Feb/09 34
33 02/Mar/09 6
我想要一个按日期顺序返回运行总计的查询,例如:
id somedate somevalue runningtotal
-- -------- --------- ------------
45 01/Jan/09 3 3
23 08/Jan/09 5 8
12 02/Feb/09 0 8
77 14/Feb/09 7 15
39 20/Feb/09 34 49
33 02/Mar/09 6 55
我知道有 在 SQL Server 2000 / 2005 / 2008 中执行此操作的各种方法。
我对这种使用聚合设置语句技巧的方法特别感兴趣:
INSERT INTO @AnotherTbl(id, somedate, somevalue, runningtotal)
SELECT id, somedate, somevalue, null
FROM TestTable
ORDER BY somedate
DECLARE @RunningTotal int
SET @RunningTotal = 0
UPDATE @AnotherTbl
SET @RunningTotal = runningtotal = @RunningTotal + somevalue
FROM @AnotherTbl
...这非常有效,但我有听说这方面存在问题,因为您不一定保证 UPDATE 语句将以正确的顺序处理行。 也许我们可以得到关于这个问题的一些明确的答案。
但也许人们还可以建议其他方法?
编辑:现在使用 SqlFiddle 以及上面的设置和“更新技巧”示例
Imagine the following table (called TestTable
):
id somedate somevalue
-- -------- ---------
45 01/Jan/09 3
23 08/Jan/09 5
12 02/Feb/09 0
77 14/Feb/09 7
39 20/Feb/09 34
33 02/Mar/09 6
I would like a query that returns a running total in date order, like:
id somedate somevalue runningtotal
-- -------- --------- ------------
45 01/Jan/09 3 3
23 08/Jan/09 5 8
12 02/Feb/09 0 8
77 14/Feb/09 7 15
39 20/Feb/09 34 49
33 02/Mar/09 6 55
I know there are various ways of doing this in SQL Server 2000 / 2005 / 2008.
I am particularly interested in this sort of method that uses the aggregating-set-statement trick:
INSERT INTO @AnotherTbl(id, somedate, somevalue, runningtotal)
SELECT id, somedate, somevalue, null
FROM TestTable
ORDER BY somedate
DECLARE @RunningTotal int
SET @RunningTotal = 0
UPDATE @AnotherTbl
SET @RunningTotal = runningtotal = @RunningTotal + somevalue
FROM @AnotherTbl
... this is very efficient but I have heard there are issues around this because you can't necessarily guarantee that the UPDATE
statement will process the rows in the correct order. Maybe we can get some definitive answers about that issue.
But maybe there are other ways that people can suggest?
edit: Now with a SqlFiddle with the setup and the 'update trick' example above
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(15)
更新,如果您运行的是 SQL Server 2012,请参阅:https://stackoverflow.com/a/10309947
问题在于 Over 子句的 SQL Server 实现是 有些有限。
Oracle(和 ANSI-SQL)允许您执行以下操作:
SQL Server 没有为您提供此问题的干净解决方案。 我的直觉告诉我,这是光标最快的罕见情况之一,尽管我必须对大结果进行一些基准测试。
更新技巧很方便,但我觉得它相当脆弱。 看来,如果您要更新完整的表,那么它将按照主键的顺序进行。 因此,如果您将日期设置为主键升序,您
可能
会是安全的。 但是您依赖于未记录的 SQL Server 实现细节(另外,如果查询最终由两个进程执行,我想知道会发生什么,请参阅:MAXDOP):完整的工作示例:
您要求提供基准,这就是真相。
最快的安全方法是游标,它比交叉连接的相关子查询快一个数量级。
绝对最快的方法是更新技巧。 我唯一担心的是,我不确定在所有情况下更新都会以线性方式进行。 查询中没有明确说明这一点。
最重要的是,对于生产代码,我会使用光标。
测试数据:
测试1:
测试2:
测试3:
测试4:
Update, if you are running SQL Server 2012 see: https://stackoverflow.com/a/10309947
The problem is that the SQL Server implementation of the Over clause is somewhat limited.
Oracle (and ANSI-SQL) allow you to do things like:
SQL Server gives you no clean solution to this problem. My gut is telling me that this is one of those rare cases where a cursor is the fastest, though I will have to do some benchmarking on big results.
The update trick is handy but I feel its fairly fragile. It seems that if you are updating a full table then it will proceed in the order of the primary key. So if you set your date as a primary key ascending you will
probably
be safe. But you are relying on an undocumented SQL Server implementation detail (also if the query ends up being performed by two procs I wonder what will happen, see: MAXDOP):Full working sample:
You asked for a benchmark this is the lowdown.
The fastest SAFE way of doing this would be the Cursor, it is an order of magnitude faster than the correlated sub-query of cross-join.
The absolute fastest way is the UPDATE trick. My only concern with it is that I am not certain that under all circumstances the update will proceed in a linear way. There is nothing in the query that explicitly says so.
Bottom line, for production code I would go with the cursor.
Test data:
Test 1:
Test 2:
Test 3:
Test 4:
在 SQL Server 2012 中,您可以将 SUM() 与OVER() 子句。
SQL Fiddle
In SQL Server 2012 you can use SUM() with the OVER() clause.
SQL Fiddle
尽管 Sam Saffron 在这方面做了出色的工作,但他仍然没有为这个问题提供递归公用表表达式代码。 对于我们使用 SQL Server 2008 R2 而不是 Denali 的人来说,它仍然是获得运行总计的最快方法,对于 100000 行,它比我工作计算机上的光标快大约 10 倍,而且它也是内联查询。
所以,就是这样(我假设表中有一个
ord
列,它是没有间隙的连续数字,为了快速处理,这个数字也应该有唯一的约束):< kbd>sql fiddle 演示
更新< /强>
我也对这个变量更新或古怪更新感到好奇。 所以通常情况下它都能正常工作,但是我们如何才能确保它每次都能正常工作呢? 好吧,这里有一个小技巧(在这里找到它 - http://www .sqlservercentral.com/Forums/Topic802558-203-21.aspx#bm981258) - 您只需检查当前和以前的
ord
并在中使用1/0
赋值如果它们与您期望的不同:据我所知,如果您的表上有适当的聚集索引/主键(在我们的例子中它将是按
ord_id
建立索引),更新将在一直是线性方式(从未遇到被零除的情况)。 也就是说,由您决定是否要在生产代码中使用它:)更新 2 我链接了这个答案,因为它包含一些有关奇怪更新不可靠性的有用信息 - < a href="https://stackoverflow.com/a/15163136/1744834">nvarchar 连接/索引/nvarchar(max) 令人费解的行为。
While Sam Saffron did great work on it, he still didn't provide recursive common table expression code for this problem. And for us who working with SQL Server 2008 R2 and not Denali, it's still fastest way to get running total, it's about 10 times faster than cursor on my work computer for 100000 rows, and it's also inline query.
So, here it is (I'm supposing that there's an
ord
column in the table and it's sequential number without gaps, for fast processing there also should be unique constraint on this number):sql fiddle demo
update
I also was curious about this update with variable or quirky update. So usually it works ok, but how we can be sure that it works every time? well, here's a little trick (found it here - http://www.sqlservercentral.com/Forums/Topic802558-203-21.aspx#bm981258) - you just check current and previous
ord
and use1/0
assignment in case they are different from what you expecting:From what I've seen if you have proper clustered index/primary key on your table (in our case it would be index by
ord_id
) update will proceed in a linear way all the time (never encountered divide by zero). That said, it's up to you to decide if you want to use it in production code :)update 2 I'm linking this answer, cause it includes some useful info about unreliability of the quirky update - nvarchar concatenation / index / nvarchar(max) inexplicable behavior.
SQL 2005 及更高版本中的 APPLY 运算符可以实现此目的:
The APPLY operator in SQL 2005 and higher works for this:
您还可以使用 ROW_NUMBER() 函数和临时表来创建任意列以在内部 SELECT 语句的比较中使用。
You can also use the ROW_NUMBER() function and a temp table to create an arbitrary column to use in the comparison on the inner SELECT statement.
使用相关子查询。 非常简单,就这样:
代码可能不完全正确,但我确信这个想法是正确的。
GROUP BY 是为了防止某个日期多次出现,您只想在结果集中看到它一次。
如果您不介意看到重复的日期,或者您想查看原始值和 id,那么以下就是您想要的:
Use a correlated sub-query. Very simple, here you go:
The code might not be exactly correct, but I'm sure that the idea is.
The GROUP BY is in case a date appears more than once, you would only want to see it once in the result set.
If you don't mind seeing repeating dates, or you want to see the original value and id, then the following is what you want:
您还可以反规范化 - 将运行总计存储在同一个表中:
http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/01/23/denormalizing-to-enforce-business-rules-running-totals.aspx
选择工作速度比任何其他解决方案快得多,但修改可能会更慢
You can also denormalize - store running totals in the same table:
http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/01/23/denormalizing-to-enforce-business-rules-running-totals.aspx
Selects work much faster than any other solutions, but modifications may be slower
如果您使用的是Sql server 2008 R2以上版本。 那么,这将是最短的方法;
LAG 用于获取上一个行值。 你可以谷歌一下以获取更多信息。
[1]:
If you are using Sql server 2008 R2 above. Then, It would be shortest way to do;
LAG is use to get previous row value. You can do google for more info.
[1]:
假设窗口在 SQL Server 2008 上工作就像在其他地方(我已经尝试过)一样,请尝试一下:
MSDN 说它在 SQL Server 2008 中可用(也许 2005 也可用?),但我没有实例可以尝试它。
编辑:嗯,显然 SQL Server 不允许窗口规范(“OVER(...)”)而不指定“PARTITION BY”(将结果分成组,但不以 GROUP BY 的方式聚合)。 烦人——MSDN 语法参考表明它是可选的,但我目前只有 SqlServer 2000 实例。
我给出的查询适用于 Oracle 10.2.0.3.0 和 PostgreSQL 8.4-beta。 所以告诉 MS 赶上;)
Assuming that windowing works on SQL Server 2008 like it does elsewhere (that I've tried), give this a go:
MSDN says it's available in SQL Server 2008 (and maybe 2005 as well?) but I don't have an instance to hand to try it.
EDIT: well, apparently SQL Server doesn't allow a window specification ("OVER(...)") without specifying "PARTITION BY" (dividing the result up into groups but not aggregating in quite the way GROUP BY does). Annoying-- the MSDN syntax reference suggests that its optional, but I only have SqlServer 2000 instances around at the moment.
The query I gave works in both Oracle 10.2.0.3.0 and PostgreSQL 8.4-beta. So tell MS to catch up ;)
虽然最好的方法是使用窗口函数来完成它,但也可以使用简单的相关子查询来完成。
Though best way is to get it done will be using a window function, it can also be done using a simple correlated sub-query.
这里有 2 种计算运行总计的简单方法:
方法 1:如果您的 DBMS 支持分析函数,可以这样写
方法 2:如果满足以下条件,您可以使用 OUTER APPLY:您的数据库版本/DBMS 本身不支持分析函数
注意:- 如果您必须分别计算不同分区的运行总计,可以按照此处发布的方式完成:计算跨行运行总计并按 ID 分组
Here are 2 simple ways to calculate running total:
Approach 1: It can be written this way if your DBMS supports Analytical Functions
Approach 2: You can make use of OUTER APPLY if your database version / DBMS itself does not support Analytical Functions
Note:- If you have to calculate the running total for different partitions separately, it can be done as posted here: Calculating Running totals across rows and grouping by ID
我相信使用下面简单的 INNER JOIN 操作可以实现运行总计。
I believe a running total can be achieved using the simple INNER JOIN operation below.
以下将产生所需的结果。
在 SomeDate 上拥有聚集索引将大大提高性能。
The following will produce the required results.
Having a clustered index on SomeDate will greatly improve the performance.
使用连接
另一种变体是使用连接。 现在查询可能如下所示:
有关更多信息,您可以访问此链接
http://askme.indianyouth.info/details /计算-简单-运行-总计-in-sql-server-12
Using join
Another variation is to use join. Now the query could look like:
for more you can visite this link
http://askme.indianyouth.info/details/calculating-simple-running-totals-in-sql-server-12