在R中,如何使用“聚合”或“通过”当并非所有因素的组合都存在时?
这是一个小例子来说明我的数据:
> df <- data.frame(subgroup=rep(paste("s",1:3, sep=""), times=3),
feature=c(rep("a",6), rep("b",3)),
var=rep(1:3, each=3),
data=c(rnorm(3,1), rnorm(3,2), rnorm(3,0)))
> df
subgroup feature var data
1 s1 a 1 1.53152620
2 s2 a 1 1.25476445
3 s3 a 1 1.04221040
4 s1 a 2 1.68913400
5 s2 a 2 1.48290273
6 s3 a 2 1.62871854
7 s1 b 3 0.05278296
8 s2 b 3 -0.66623654
9 s3 b 3 -1.40006454
我想检查数据集中存在的每个特征变量组合的“数据”列的总和。更准确地说,当总和大于 3 时,我希望获得 TRUE,否则为 FALSE:
> result
feature snp res
1 a 1 TRUE
2 a 2 TRUE
3 b 3 FALSE
我尝试使用“aggregate”或“by”,但无法使它们满足我的需要。有什么想法吗?提前致谢。
Here is a small example to illustrate my data:
> df <- data.frame(subgroup=rep(paste("s",1:3, sep=""), times=3),
feature=c(rep("a",6), rep("b",3)),
var=rep(1:3, each=3),
data=c(rnorm(3,1), rnorm(3,2), rnorm(3,0)))
> df
subgroup feature var data
1 s1 a 1 1.53152620
2 s2 a 1 1.25476445
3 s3 a 1 1.04221040
4 s1 a 2 1.68913400
5 s2 a 2 1.48290273
6 s3 a 2 1.62871854
7 s1 b 3 0.05278296
8 s2 b 3 -0.66623654
9 s3 b 3 -1.40006454
I want to examine the sum of the "data" column for each combination of feature-var that are present in my dataset. More precisely, I want to obtain TRUE when the sum is bigger than 3, and FALSE otherwise:
> result
feature snp res
1 a 1 TRUE
2 a 2 TRUE
3 b 3 FALSE
I tried using "aggregate" or "by", but can't make them fit my need. Any idea? Thanks in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
一种方法是使用
plyr
的函数ddply
对 feature 和 var 进行分组。您可以使用summarize
函数创建一个新的data.frame
,其中的列与您开发的规则相对应。结果:
另一种选择是使用 data.table ,它应该提供一些性能优势:
One approach is to use
plyr
's functionddply
to group on feature and var. You can use thesummarize
function to create a newdata.frame
with a column that corresponds to the rule you developed.Results in:
Another alternative is to use
data.table
which is supposed to provide some performance benefits: