R 中的聚合和加权平均值

发布于 2024-09-12 05:42:05 字数 310 浏览 9 评论 0原文

我正在尝试按资产类别计算资产加权回报。对于我的一生,我无法弄清楚如何使用聚合命令来做到这一点。

我的数据框看起来像这样,

dat <- data.frame(company, fundname, assetclass, return, assets)

我正在尝试做类似的事情(不要复制这个,这是错误的):

aggregate(dat, list(dat$assetclass), weighted.mean, w=(dat$return, dat$assets))

I'm trying to calculate asset-weighted returns by asset class. For the life of me, I can't figure out how to do it using the aggregate command.

My data frame looks like this

dat <- data.frame(company, fundname, assetclass, return, assets)

I'm trying to do something like (don't copy this, it's wrong):

aggregate(dat, list(dat$assetclass), weighted.mean, w=(dat$return, dat$assets))

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

丢了幸福的猪 2024-09-19 05:42:06

使用聚合也可以轻松完成此操作。它有助于记住加权平均值的替代方程。

rw <- dat$return * dat$assets
dat1 <- aggregate(rw ~ assetclass, data = dat, sum)
datw <- aggregate(assets ~ assetclass, data = dat, sum)
dat1$weighted.return <- dat1$rw / datw$assets

This is also easily done with aggregate. It helps to remember alternate equations for a weighted mean.

rw <- dat$return * dat$assets
dat1 <- aggregate(rw ~ assetclass, data = dat, sum)
datw <- aggregate(assets ~ assetclass, data = dat, sum)
dat1$weighted.return <- dat1$rw / datw$assets
究竟谁懂我的在乎 2024-09-19 05:42:06

最近发布的 collapse 包通过提供一整套 快速统计函数在 C++ 内部执行分组和加权计算:

library(collapse)
dat <- data.frame(assetclass = sample(LETTERS[1:5], 20, replace = TRUE), 
                  return = rnorm(20), assets = 1e7+1e7*runif(20))

# Using collap() function with fmean, which supports weights: (by default weights are aggregated using the sum, which is prevented using keep.w = FALSE)
collap(dat, return ~ assetclass, fmean, w = ~ assets, keep.w = FALSE)
##   assetclass     return
## 1          A -0.4667822
## 2          B  0.5417719
## 3          C -0.8810705
## 4          D  0.6301396
## 5          E  0.3101673

# Can also use a dplyr-like workflow: (use keep.w = FALSE to omit sum.assets)
library(magrittr)
dat %>% fgroup_by(assetclass) %>% fmean(assets)
##   assetclass sum.assets     return
## 1          A   80683025 -0.4667822
## 2          B   27411156  0.5417719
## 3          C   22627377 -0.8810705
## 4          D  146355734  0.6301396
## 5          E   25463042  0.3101673

# Or simply a direct computation yielding a vector:
dat %$% fmean(return, assetclass, assets)
##          A          B          C          D          E 
## -0.4667822  0.5417719 -0.8810705  0.6301396  0.3101673 

The recently released collapse package provides a fast solution to this and similar problems (using weighted median, mode etc.) by providing a full set of Fast Statistical Functions performing grouped and weighted computations internally in C++:

library(collapse)
dat <- data.frame(assetclass = sample(LETTERS[1:5], 20, replace = TRUE), 
                  return = rnorm(20), assets = 1e7+1e7*runif(20))

# Using collap() function with fmean, which supports weights: (by default weights are aggregated using the sum, which is prevented using keep.w = FALSE)
collap(dat, return ~ assetclass, fmean, w = ~ assets, keep.w = FALSE)
##   assetclass     return
## 1          A -0.4667822
## 2          B  0.5417719
## 3          C -0.8810705
## 4          D  0.6301396
## 5          E  0.3101673

# Can also use a dplyr-like workflow: (use keep.w = FALSE to omit sum.assets)
library(magrittr)
dat %>% fgroup_by(assetclass) %>% fmean(assets)
##   assetclass sum.assets     return
## 1          A   80683025 -0.4667822
## 2          B   27411156  0.5417719
## 3          C   22627377 -0.8810705
## 4          D  146355734  0.6301396
## 5          E   25463042  0.3101673

# Or simply a direct computation yielding a vector:
dat %$% fmean(return, assetclass, assets)
##          A          B          C          D          E 
## -0.4667822  0.5417719 -0.8810705  0.6301396  0.3101673 
鲜血染红嫁衣 2024-09-19 05:42:05

对于初学者来说,w=(dat$return, dat$assets)) 是一个语法错误。

plyr 使这变得更容易一些:

> set.seed(42)   # fix seed so that you get the same results
> dat <- data.frame(assetclass=sample(LETTERS[1:5], 20, replace=TRUE), 
+                   return=rnorm(20), assets=1e7+1e7*runif(20))
> library(plyr)
> ddply(dat, .(assetclass),   # so by asset class invoke following function
+       function(x) data.frame(wret=weighted.mean(x$return, x$assets)))
  assetclass     wret
1          A -2.27292
2          B -0.19969
3          C  0.46448
4          D -0.71354
5          E  0.55354
> 

For starters, w=(dat$return, dat$assets)) is a syntax error.

And plyr makes this a little easier:

> set.seed(42)   # fix seed so that you get the same results
> dat <- data.frame(assetclass=sample(LETTERS[1:5], 20, replace=TRUE), 
+                   return=rnorm(20), assets=1e7+1e7*runif(20))
> library(plyr)
> ddply(dat, .(assetclass),   # so by asset class invoke following function
+       function(x) data.frame(wret=weighted.mean(x$return, x$assets)))
  assetclass     wret
1          A -2.27292
2          B -0.19969
3          C  0.46448
4          D -0.71354
5          E  0.55354
> 
趁年轻赶紧闹 2024-09-19 05:42:05

data.table 解决方案比 plyr 更快

library(data.table)
DT <- data.table(dat)
DT[,list(wret = weighted.mean(return,assets)),by=assetclass]
##    assetclass        wret
## 1:          A -0.05445455
## 2:          E -0.56614312
## 3:          D -0.43007547
## 4:          B  0.69799701
## 5:          C  0.08850954

A data.table solution, will be faster than plyr

library(data.table)
DT <- data.table(dat)
DT[,list(wret = weighted.mean(return,assets)),by=assetclass]
##    assetclass        wret
## 1:          A -0.05445455
## 2:          E -0.56614312
## 3:          D -0.43007547
## 4:          B  0.69799701
## 5:          C  0.08850954
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文