如何使用r dplyr的总结来计算符合条件的行数?
我有一个我想总结的数据集。首先,我想要家与客场游戏的总和。但是,我也想知道每个子类别(家,外出)中有多少个离群值(定义为300点以上)。
如果我不使用摘要,我知道dplyr
具有count()
函数,但是我希望此解决方案出现在我的summarize()< /代码>调用。这是我拥有的和我尝试过的,无法执行:
#Test data
library(dplyr)
test <- tibble(score = c(100, 150, 200, 301, 150, 345, 102, 131),
location = c("home", "away", "home", "away", "home", "away", "home", "away"),
more_than_300 = c(FALSE, FALSE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE))
#attempt 1, count rows that match a criteria
test %>%
group_by(location) %>%
summarize(total_score = sum(score),
n_outliers = nrow(.[more_than_300 == FALSE]))
I have a dataset that I want to summarize. First, I want the sum of the home and away games, which I can do. However, I also want to know how many outliers (defined as more than 300 points) are within each subcategory (home, away).
If I wasn't using summarize, I know dplyr
has the count()
function, but I'd like this solution to appear in my summarize()
call. Here's what I have and what I've tried, which fails to perform:
#Test data
library(dplyr)
test <- tibble(score = c(100, 150, 200, 301, 150, 345, 102, 131),
location = c("home", "away", "home", "away", "home", "away", "home", "away"),
more_than_300 = c(FALSE, FALSE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE))
#attempt 1, count rows that match a criteria
test %>%
group_by(location) %>%
summarize(total_score = sum(score),
n_outliers = nrow(.[more_than_300 == FALSE]))
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以在逻辑向量上使用
sum
- 它将自动将它们转换为数字值(true
等于1和false
等于0) ,因此您只需要做:或者,如果这些是您仅有的3列,则等效的是:
实际上,您不需要制作
MORE_THAN_300
列 - 这足以做到:You can use
sum
on logical vectors - it will automatically convert them into numeric values (TRUE
being equal to 1 andFALSE
being equal to 0), so you need only do:Or, if these are your only 3 columns, an equivalent would be:
In fact, you don't need to make the
more_than_300
column - it would suffice to do:在基本r中,我们可以尝试
汇总
这样In base R, we can try
aggregate
like this在基本
XTABS中,可以使用每组总和。
或通过即时计算异常值并提供所需的列名称。
另一个选项(也在基础上)将是
rowsum
。XTABS
和rowsum
专门用于计算每个组的总和,并且可能在此任务中表现出色。In base
xtabs
could be used to sum up per group.Or by calculating the outliers on the fly and giving desired column names.
Another option, also in base, will be
rowsum
.xtabs
androwsum
are specialized in calculating sums per group and might be performant in this task.