不平衡时间序列的滚动总和

发布于 2024-12-28 15:00:45 字数 874 浏览 1 评论 0原文

我有一系列每个类别的年度事件计数，没有该类别没有发生事件的年份的行。我想添加一列，显示每年过去三年发生的事件数量。

处理此问题的一种方法是为零事故的所有年份添加空行，然后将 rollapply() 与左对齐的四年窗口一起使用，但这会扩展我的数据集，超出我的预期。当然有办法使用 ddply() 和 transform 来实现此目的吗？

以下两行代码构建一个虚拟数据集，然后按类别执行简单的 plyr sum：

dat <- data.frame(
   category=c(rep('A',6), rep('B',6), rep('C',6)), 
   year=rep(c(2000,2001,2004,2005,2009, 2010),3), 
   incidents=rpois(18, 3)
   )

ddply(dat, .(category) , transform, i_per_c=sum(incidents) )

这可行，但它只显示每个类别的总计。

我想要一个与年份相关的总数。

因此，我尝试使用 function() 语法扩展 ddply() 调用，如下所示：

ddply(dat, .(category) , transform, 
      function(x) i_per_c=sum(ifelse(x$year >= year - 4 & x$year < year,  x$incidents, 0) )
      )

这仅返回未经修改的原始数据帧。

我一定在 plyr 语法中遗漏了一些东西，但我不知道它是什么。

谢谢，马特

原文

I have a series of annual incident counts per category, with no rows for years in which the category did not see an incident. I would like to add a column that shows, for each year, how many incidents occurred in the previous three years.

One way to handle this is to add empty rows for all years with zero incidents, then use rollapply() with a left-aligned four year window, but that would expand my data set more than I want to. Surely there's a way to use ddply() and transform for this?

The following two lines of code build a dummy data set, then execute a simple plyr sum by category:

dat <- data.frame(
   category=c(rep('A',6), rep('B',6), rep('C',6)), 
   year=rep(c(2000,2001,2004,2005,2009, 2010),3), 
   incidents=rpois(18, 3)
   )

ddply(dat, .(category) , transform, i_per_c=sum(incidents) )

That works, but it only shows a per-category total.

I want a total that's year-dependent.

So I try to expand the ddply() call with the function() syntax, like so:

ddply(dat, .(category) , transform, 
      function(x) i_per_c=sum(ifelse(x$year >= year - 4 & x$year < year,  x$incidents, 0) )
      )

This just returns the original data frame, unmodified.

I must be missing something in the plyr syntax, but I don't know what it is.

Thanks,
Matt

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

兮颜 2025-01-04 15:00:45

这有点丑陋，但它有效。嵌套层调用：

ddply(dat, .(category), 
    function(datc) adply(datc, 1, 
         function(x) data.frame(run_incidents =
                                sum(subset(datc, year>(x$year-2) & year<=x$year)$incidents))))

可能有一种稍微简洁的方法来做到这一点，并且肯定有执行速度更快的方法。

This is sorta ugly, but it works. Nested ply calls:

ddply(dat, .(category), 
    function(datc) adply(datc, 1, 
         function(x) data.frame(run_incidents =
                                sum(subset(datc, year>(x$year-2) & year<=x$year)$incidents))))

There might be a slightly cleaner way to do it, and there are definitely ways that execute much faster.

回复收藏 0 原文

~没有更多了~