在R中，如何使用“聚合”或“通过”当并非所有因素的组合都存在时？

发布于 2024-12-08 01:11:21 字数 885 浏览 3 评论 0原文

这是一个小例子来说明我的数据：

> df <- data.frame(subgroup=rep(paste("s",1:3, sep=""), times=3),
                   feature=c(rep("a",6), rep("b",3)),
                   var=rep(1:3, each=3),
                   data=c(rnorm(3,1), rnorm(3,2), rnorm(3,0)))
> df
  subgroup feature var        data
1       s1       a   1  1.53152620
2       s2       a   1  1.25476445
3       s3       a   1  1.04221040
4       s1       a   2  1.68913400
5       s2       a   2  1.48290273
6       s3       a   2  1.62871854
7       s1       b   3  0.05278296
8       s2       b   3 -0.66623654
9       s3       b   3 -1.40006454

我想检查数据集中存在的每个特征变量组合的“数据”列的总和。更准确地说，当总和大于 3 时，我希望获得 TRUE，否则为 FALSE：

> result
  feature snp   res
1       a   1  TRUE
2       a   2  TRUE
3       b   3 FALSE

我尝试使用“aggregate”或“by”，但无法使它们满足我的需要。有什么想法吗？提前致谢。

原文

Here is a small example to illustrate my data:

> df <- data.frame(subgroup=rep(paste("s",1:3, sep=""), times=3),
                   feature=c(rep("a",6), rep("b",3)),
                   var=rep(1:3, each=3),
                   data=c(rnorm(3,1), rnorm(3,2), rnorm(3,0)))
> df
  subgroup feature var        data
1       s1       a   1  1.53152620
2       s2       a   1  1.25476445
3       s3       a   1  1.04221040
4       s1       a   2  1.68913400
5       s2       a   2  1.48290273
6       s3       a   2  1.62871854
7       s1       b   3  0.05278296
8       s2       b   3 -0.66623654
9       s3       b   3 -1.40006454

I want to examine the sum of the "data" column for each combination of feature-var that are present in my dataset. More precisely, I want to obtain TRUE when the sum is bigger than 3, and FALSE otherwise:

> result
  feature snp   res
1       a   1  TRUE
2       a   2  TRUE
3       b   3 FALSE

I tried using "aggregate" or "by", but can't make them fit my need. Any idea? Thanks in advance.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

述情 2024-12-15 01:11:21

一种方法是使用 plyr 的函数 ddply 对 feature 和 var 进行分组。您可以使用 summarize 函数创建一个新的 data.frame，其中的列与您开发的规则相对应。

library(plyr)
ddply(df, c("feature", "var"), summarize, res = ifelse(sum(data) > 3,TRUE, FALSE))

结果：

  feature var   res
1       a   1  TRUE
2       a   2  TRUE
3       b   3 FALSE

另一种选择是使用 data.table ，它应该提供一些性能优势：

library(data.table)
dt <- data.table(df)

dt[, ifelse(sum(data) > 3, TRUE, FALSE), by = c("feature", "var")]

     feature var    V1
[1,]       a   1  TRUE
[2,]       a   2  TRUE
[3,]       b   3 FALSE

One approach is to use plyr's function ddply to group on feature and var. You can use the summarize function to create a new data.frame with a column that corresponds to the rule you developed.

library(plyr)
ddply(df, c("feature", "var"), summarize, res = ifelse(sum(data) > 3,TRUE, FALSE))

Results in:

  feature var   res
1       a   1  TRUE
2       a   2  TRUE
3       b   3 FALSE

Another alternative is to use data.table which is supposed to provide some performance benefits:

library(data.table)
dt <- data.table(df)

dt[, ifelse(sum(data) > 3, TRUE, FALSE), by = c("feature", "var")]

     feature var    V1
[1,]       a   1  TRUE
[2,]       a   2  TRUE
[3,]       b   3 FALSE

回复收藏 0 原文

~没有更多了~