R 在每年年初将 cumsum 重置为零
我有一个包含大量捐赠数据的数据框。我获取数据并按时间顺序从最旧的礼物到最近的礼物进行排列。接下来,我添加一列,其中包含一段时间内礼物的累计金额。该数据包含多年的数据,我一直在寻找一种在每年年初将 cumsum 重置为 0 的好方法(出于财务目的,该年从 7 月 1 日开始和结束)。
这就是它目前的样子:
id date giftamt cumsum()
005 01-05-2001 20.00 20.00
007 06-05-2001 25.00 45.00
009 12-05-2001 20.00 65.00
012 02-05-2002 30.00 95.00
015 08-05-2002 50.00 145.00
025 12-05-2002 25.00 170.00
... ... ... ...
这就是我希望它看起来的样子:
id date giftamt cumsum()
005 01-05-2001 20.00 20.00
007 06-05-2001 25.00 45.00
009 12-05-2001 20.00 20.00
012 02-05-2002 30.00 50.00
015 08-05-2002 50.00 50.00
025 12-05-2002 25.00 75.00
... ... ... ...
有什么建议吗?
更新:
这是最终由 Seb 提供的代码:
#tweak for changing the calendar year to fiscal year
df$year <- as.numeric(format(as.Date(df$giftdate), format="%Y"))
df$month <- as.numeric(format(as.Date(df$giftdate), format="%m"))
df$year <- ifelse(df$month<=6, df$year, df$year+1)
#cum-summing :)
library(plyr)
finalDf <- ddply(df, .(year), summarize, cumsum(as.numeric(as.character(giftamt))))
I have a dataframe
with a bunch of donations data. I take the data and arrange it in time order from oldest to most recent gifts. Next I add a column containing a cumulative sum of the gifts over time. The data has multiple years of data and I was looking for a good way to reset the cumsum
to 0 at the start of each year (the year starts and ends July 1st for fiscal purposes).
This is how it currently is:
id date giftamt cumsum()
005 01-05-2001 20.00 20.00
007 06-05-2001 25.00 45.00
009 12-05-2001 20.00 65.00
012 02-05-2002 30.00 95.00
015 08-05-2002 50.00 145.00
025 12-05-2002 25.00 170.00
... ... ... ...
this is how I would like it to look:
id date giftamt cumsum()
005 01-05-2001 20.00 20.00
007 06-05-2001 25.00 45.00
009 12-05-2001 20.00 20.00
012 02-05-2002 30.00 50.00
015 08-05-2002 50.00 50.00
025 12-05-2002 25.00 75.00
... ... ... ...
Any suggestions?
UPDATE:
Here's the code that finally worked courtesy of Seb :
#tweak for changing the calendar year to fiscal year
df$year <- as.numeric(format(as.Date(df$giftdate), format="%Y"))
df$month <- as.numeric(format(as.Date(df$giftdate), format="%m"))
df$year <- ifelse(df$month<=6, df$year, df$year+1)
#cum-summing :)
library(plyr)
finalDf <- ddply(df, .(year), summarize, cumsum(as.numeric(as.character(giftamt))))
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我会这样尝试(df 是数据框):
i would try it this way (df being the dataframe):
有两个任务:在代表每年的数据框中创建一列,然后拆分数据、应用累积和并重新组合。 R 有很多方法可以完成这两个部分。
第一个任务的最易读的方式可能是使用
lubridate
包中的year
。请注意,R 有很多日期格式,因此值得检查一下您当前是否使用
POSIXct
或Date
或chron
或Zoo
或xts
或其他格式之一。我推荐 Seb 在第二个任务中选择 ddply 。为了完整起见,您还可以使用
tapply
或aggregate
。使用您希望在 7 月 1 日更改年份的新信息,将年份列更新为
There are two tasks: create a column in the data frame representing each year, then split the data, apply the cumsum, and recombine. R has lots ways of doing both parts.
Probably the most readable way of dong the first task is with
year
from thelubridate
package.Note that R has lots of date formats, so it's worth checking to see whether you are currently using
POSIXct
orDate
orchron
orzoo
orxts
or one of the other formats.Seb's choice or
ddply
for the second task is the one I'd recommend. For completeness, you can also usetapply
oraggregate
.With the new info that you want years to change on 1st July, update the year column to