通过应用对数差异计算增长率

发布于 2024-08-29 02:47:41 字数 2011 浏览 3 评论 0原文

我试图通过计算每列的 log-differences 来转换我的 data.frame 并控制行id。所以基本上我喜欢计算每个 id 变量的增长率。 所以这里是一个带有 id 列、时间段列 p 和三个变量列的随机 df:

df <- data.frame (id = c("a","a","a","c","c","d","d","d","d","d"),
                  p = c(1,2,3,1,2,1,2,3,4,5),
                  var1 = rnorm(10, 5),
                  var2 = rnorm(10, 5),
                  var3 = rnorm(10, 5)
                  )
df
     id p     var1     var2     var3
1     a 1 5.375797 4.110324 5.773473
2     a 2 4.574700 6.541862 6.116153
3     a 3 3.029428 4.931924 5.631847
4     c 1 5.375855 4.181034 5.756510
5     c 2 5.067131 6.053009 6.746442
6     d 1 3.846438 4.515268 6.920389
7     d 2 4.910792 5.525340 4.625942
8     d 3 6.410238 5.138040 7.404533
9     d 4 4.637469 3.522542 3.661668
10    d 5 5.519138 4.599829 5.566892

现在我已经编写了一个函数,它完全符合我的要求,但我不得不绕道而行,这可能是不必要的,可以删除。但是,不知何故我无法找到 快捷方式。 以下是已发布数据框的函数和输出:

fct.logDiff <- function (df) {
df.log <- dlply (df, "code", function(x) data.frame (p = x$p, log(x[, -c(1,2)])))
list.nalog <- llply (df.log, function(x) data.frame (p = x$p, rbind(NA, sapply(x[,-1], diff))))
ldply (list.nalog, data.frame)
}

 fct.logDiff(df)
     id p        var1        var2        var3
1     a 1          NA          NA          NA
2     a 2 -0.16136569  0.46472004  0.05765945
3     a 3 -0.41216720 -0.28249264 -0.08249587
4     c 1          NA          NA          NA
5     c 2 -0.05914281  0.36999681  0.15868378
6     d 1          NA          NA          NA
7     d 2  0.24428771  0.20188025 -0.40279188
8     d 3  0.26646102 -0.07267311  0.47041227
9     d 4 -0.32372771 -0.37748866 -0.70417351
10    d 5  0.17405309  0.26683625  0.41891802

问题是由于添加了 NA 行造成的。我不想折叠框架并缩小它,这将由 diff() 函数自动完成。因此,我的原始框架中有 10 行,并且在转换后保留相同数量的行。为了保持相同的长度,我必须添加一些NA。我绕道将 data.frame 转换为列表,将 NA 添加到每个 id 的第一行,然后将列表转换回 data.frame。看起来很乏味。

有什么想法可以避免 data.frame-list-data.frame 类转换并优化功能吗?

I am trying to transform my data.frame by calculating the log-differences of each column
and controlling for the rows id. So basically I like to calculate the growth rates for each id's variable.
So here is a random df with an id column, a time period colum p and three variable columns:

df <- data.frame (id = c("a","a","a","c","c","d","d","d","d","d"),
                  p = c(1,2,3,1,2,1,2,3,4,5),
                  var1 = rnorm(10, 5),
                  var2 = rnorm(10, 5),
                  var3 = rnorm(10, 5)
                  )
df
     id p     var1     var2     var3
1     a 1 5.375797 4.110324 5.773473
2     a 2 4.574700 6.541862 6.116153
3     a 3 3.029428 4.931924 5.631847
4     c 1 5.375855 4.181034 5.756510
5     c 2 5.067131 6.053009 6.746442
6     d 1 3.846438 4.515268 6.920389
7     d 2 4.910792 5.525340 4.625942
8     d 3 6.410238 5.138040 7.404533
9     d 4 4.637469 3.522542 3.661668
10    d 5 5.519138 4.599829 5.566892

Now I have written a function which does exactly what I want BUT I had to take a detour which is possibly unnecessary and can be removed. However, somehow I am not able to locate
the shortcut.
Here is the function and the output for the posted data frame:

fct.logDiff <- function (df) {
df.log <- dlply (df, "code", function(x) data.frame (p = x$p, log(x[, -c(1,2)])))
list.nalog <- llply (df.log, function(x) data.frame (p = x$p, rbind(NA, sapply(x[,-1], diff))))
ldply (list.nalog, data.frame)
}

 fct.logDiff(df)
     id p        var1        var2        var3
1     a 1          NA          NA          NA
2     a 2 -0.16136569  0.46472004  0.05765945
3     a 3 -0.41216720 -0.28249264 -0.08249587
4     c 1          NA          NA          NA
5     c 2 -0.05914281  0.36999681  0.15868378
6     d 1          NA          NA          NA
7     d 2  0.24428771  0.20188025 -0.40279188
8     d 3  0.26646102 -0.07267311  0.47041227
9     d 4 -0.32372771 -0.37748866 -0.70417351
10    d 5  0.17405309  0.26683625  0.41891802

The trouble is due to the added NA-rows. I don't want to collapse the frame and reduce it, which would be automatically done by the diff() function. So I had 10 rows in my original frame and am keeping the same amount of rows after the transformation. In order to keep the same length I had to add some NAs. I have taken a detour by transforming the data.frame into a list, add the NAs to each id's first line, and afterwards transform the list back into a data.frame. That looks tedious.

Any ideas to avoid the data.frame-list-data.frame class transformation and optimize the function?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

情定在深秋 2024-09-05 02:47:41

这个怎么样?

nadiff <- function(x, ...) c(NA, diff(x, ...))
ddply(df, "code", colwise(nadiff, c("var1", "var2", "var3")))

How about this?

nadiff <- function(x, ...) c(NA, diff(x, ...))
ddply(df, "code", colwise(nadiff, c("var1", "var2", "var3")))
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文