有没有更好的方法来识别系列模式的时间间隔边界?
我有一个付款表,其中包含正值和负值(即捕获和贷项)。我需要确定自上次净正金额以来我们收到净正金额的点。例如,如果客户进行这些付款并收到这些积分:
01/01 $100 <-
02/01 -$100
03/01 -$100
04/01 $100
05/01 $100
06/01 $100 <-
...那么积分将为 01/01 和 06/01:从 02/01 到 04/01,他们有负余额,并且截至 05 /01 他们的余额为零。
我当前的方法首先根据捕获的所有日期构建结束日期列表,然后计算每个日期的开始日期,最后计算这些时间段的净捕获:
Start End NetCaptures
1900/01/01 2011/01/01 $100
2011/01/02 2011/04/01 -$100
2011/04/02 2011/05/01 $100
2011/05/02 2011/06/01 $100
然后我丢弃 NetCaptures 为 0 美元或更少的记录,重新计算开始日期,重新计算净捕获量,然后重复,直到没有要删除的记录为止。
Start End NetCaptures
1900/01/01 2011/01/01 $100
2011/01/02 2011/06/01 $100
有更好的方法吗?一些分析表达式的巧妙运用?这已经接近 RBAR 了。实际上,它的运行速度是可以接受的(500K 记录需要 10 分钟,而在我开始以这种方式计算学分之前需要 1.5 分钟)。
* 结果 *
虽然 Microsoft 确实支持优雅的滚动总计函数,但使用这种想法我最终得到了类似的代码这:计算所有捕获,计算每个捕获的运行总计,并丢弃那些具有相同或更大运行总计的早期记录。
CREATE TABLE #Sequences
(
OrderID INT NOT NULL,
Sequence INT NOT NULL,
PRIMARY KEY (OrderID, Sequence),
StartDate DATE NOT NULL DEFAULT '1900-01-01',
EndDate DATE NOT NULL,
CapturesThisPeriod DECIMAL(18, 2) NOT NULL DEFAULT 0.00,
)
INSERT INTO #Sequences (OrderID, Sequence, EndDate)
SELECT OrderID, ROW_NUMBER() OVER (PARTITION BY OrderID ORDER BY DateReceived), DateReceived
FROM Receipts
WHERE Amount > 0.00
/* Calculate the start date for each period */
UPDATE S
SET StartDate = DATEADD(D, 1, Prev.EndDate)
FROM
#Sequences AS S
INNER JOIN #Sequences AS Prev ON S.OrderID = Prev.OrderID AND Prev.Sequence = S.Sequence - 1
/* Calculate the cumulative total for each period */
UPDATE M
SET CumulativeReceipts = R.Receipts
FROM
#Sequences AS M
INNER JOIN
(
SELECT
M.OrderID, M.Sequence, SUM(R.Amount) AS Receipts
FROM
#Sequences AS M
INNER JOIN Receipts AS R ON M.OrderID = R.OrderID AND R.DateReceived <= M.EndDate
GROUP BY
M.OrderID, M.Sequence
) AS R ON M.OrderID = R.OrderID AND M.Sequence = R.Sequence
/* Delete sequences with do not represent net positive receipts */
DELETE FROM M
FROM #Sequences AS M
WHERE EXISTS (SELECT * FROM #Sequences AS Prev WHERE M.OrderID = Prev.OrderID AND Prev.Sequence < M.Sequence AND Prev.CumulativeReceipts >= M.CumulativeReceipts)
/* Recalculate sequence numbers and dates */
UPDATE S SET Sequence = NewSequence FROM (SELECT Sequence, ROW_NUMBER() OVER (PARTITION BY OrderID ORDER BY Sequence) AS NewSequence FROM #Sequences) AS S
UPDATE S
SET StartDate = DATEADD(D, 1, Prev.EndDate)
FROM
#Sequences AS S
INNER JOIN #Sequences AS Prev ON S.OrderID = Prev.OrderID AND Prev.Sequence = S.Sequence - 1
END
/* Calculate net captures per period, and continue with analysis */
I have a table of payments, with positive and negative values (i.e., captures and credits). I need to identify the points where we've received a net positive amount, since the last net positive amount. For example, if the customer makes these payments and receives these credits:
01/01 $100 <-
02/01 -$100
03/01 -$100
04/01 $100
05/01 $100
06/01 $100 <-
...then the points would be 01/01 and 06/01: as of 02/01 through 04/01, they have a negative balance, and as of 05/01 they have a zero balance.
My current approach starts by building a list of end dates from all dates with a capture, then calculates a start date for each of these, and finally calculates the net captures for these periods:
Start End NetCaptures
1900/01/01 2011/01/01 $100
2011/01/02 2011/04/01 -$100
2011/04/02 2011/05/01 $100
2011/05/02 2011/06/01 $100
I then discard records with a NetCaptures of $0 or less, recalculate start dates, recalculate net captures, and repeat until there are no records to delete, leaving this.
Start End NetCaptures
1900/01/01 2011/01/01 $100
2011/01/02 2011/06/01 $100
Is there a better way to do this? Some clever use of analysis expressions? This is getting close to RBAR. In practice, it runs acceptably quickly (ten minutes for 500K records, versus 1.5 before I started accounting for credits in this way).
* RESULT *
While Microsoft does support a graceful rolling total function, using that thought I ended up with code like this: calculate all captures, calculate the running total up to each one, and discard those where there is an earlier record that has an equal or greater running total.
CREATE TABLE #Sequences
(
OrderID INT NOT NULL,
Sequence INT NOT NULL,
PRIMARY KEY (OrderID, Sequence),
StartDate DATE NOT NULL DEFAULT '1900-01-01',
EndDate DATE NOT NULL,
CapturesThisPeriod DECIMAL(18, 2) NOT NULL DEFAULT 0.00,
)
INSERT INTO #Sequences (OrderID, Sequence, EndDate)
SELECT OrderID, ROW_NUMBER() OVER (PARTITION BY OrderID ORDER BY DateReceived), DateReceived
FROM Receipts
WHERE Amount > 0.00
/* Calculate the start date for each period */
UPDATE S
SET StartDate = DATEADD(D, 1, Prev.EndDate)
FROM
#Sequences AS S
INNER JOIN #Sequences AS Prev ON S.OrderID = Prev.OrderID AND Prev.Sequence = S.Sequence - 1
/* Calculate the cumulative total for each period */
UPDATE M
SET CumulativeReceipts = R.Receipts
FROM
#Sequences AS M
INNER JOIN
(
SELECT
M.OrderID, M.Sequence, SUM(R.Amount) AS Receipts
FROM
#Sequences AS M
INNER JOIN Receipts AS R ON M.OrderID = R.OrderID AND R.DateReceived <= M.EndDate
GROUP BY
M.OrderID, M.Sequence
) AS R ON M.OrderID = R.OrderID AND M.Sequence = R.Sequence
/* Delete sequences with do not represent net positive receipts */
DELETE FROM M
FROM #Sequences AS M
WHERE EXISTS (SELECT * FROM #Sequences AS Prev WHERE M.OrderID = Prev.OrderID AND Prev.Sequence < M.Sequence AND Prev.CumulativeReceipts >= M.CumulativeReceipts)
/* Recalculate sequence numbers and dates */
UPDATE S SET Sequence = NewSequence FROM (SELECT Sequence, ROW_NUMBER() OVER (PARTITION BY OrderID ORDER BY Sequence) AS NewSequence FROM #Sequences) AS S
UPDATE S
SET StartDate = DATEADD(D, 1, Prev.EndDate)
FROM
#Sequences AS S
INNER JOIN #Sequences AS Prev ON S.OrderID = Prev.OrderID AND Prev.Sequence = S.Sequence - 1
END
/* Calculate net captures per period, and continue with analysis */
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
搜索“运行总和”;例如, http://explainextend.com/2010/01/22/ sql-server-running-totals
Search around for "running sum"; for example, http://explainextended.com/2010/01/22/sql-server-running-totals