如何使用r dplyr的总结来计算符合条件的行数?

发布于 2025-01-22 06:15:05 字数 705 浏览 0 评论 0原文

我有一个我想总结的数据集。首先,我想要家与客场游戏的总和。但是,我也想知道每个子类别(家,外出)中有多少个离群值(定义为300点以上)。

如果我不使用摘要,我知道dplyr具有count()函数,但是我希望此解决方案出现在我的summarize()< /代码>调用。这是我拥有的和我尝试过的,无法执行:

#Test data
library(dplyr)

test <- tibble(score = c(100, 150, 200, 301, 150, 345, 102, 131),
                  location = c("home", "away", "home", "away", "home", "away", "home", "away"),
                  more_than_300 = c(FALSE, FALSE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE))


#attempt 1, count rows that match a criteria
test %>%
  group_by(location) %>%
  summarize(total_score = sum(score),
            n_outliers = nrow(.[more_than_300 == FALSE]))

I have a dataset that I want to summarize. First, I want the sum of the home and away games, which I can do. However, I also want to know how many outliers (defined as more than 300 points) are within each subcategory (home, away).

If I wasn't using summarize, I know dplyr has the count() function, but I'd like this solution to appear in my summarize() call. Here's what I have and what I've tried, which fails to perform:

#Test data
library(dplyr)

test <- tibble(score = c(100, 150, 200, 301, 150, 345, 102, 131),
                  location = c("home", "away", "home", "away", "home", "away", "home", "away"),
                  more_than_300 = c(FALSE, FALSE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE))


#attempt 1, count rows that match a criteria
test %>%
  group_by(location) %>%
  summarize(total_score = sum(score),
            n_outliers = nrow(.[more_than_300 == FALSE]))

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

盗琴音 2025-01-29 06:15:05

您可以在逻辑向量上使用sum - 它将自动将它们转换为数字值(true等于1和false等于0) ,因此您只需要做:

test %>%
  group_by(location) %>%
  summarize(total_score = sum(score),
            n_outliers  = sum(more_than_300))
#> # A tibble: 2 x 3
#>   location total_score n_outliers
#>   <chr>          <dbl>      <int>
#> 1 away             927          2
#> 2 home             552          0

或者,如果这些是您仅有的3列,则等效的是:

test %>%
  group_by(location) %>%
  summarize(across(everything(), sum))

实际上,您不需要制作MORE_THAN_300列 - 这足以做到:

test %>%
  group_by(location) %>%
  summarize(total_score = sum(score),
            n_outliers  = sum(score > 300))

You can use sum on logical vectors - it will automatically convert them into numeric values (TRUE being equal to 1 and FALSE being equal to 0), so you need only do:

test %>%
  group_by(location) %>%
  summarize(total_score = sum(score),
            n_outliers  = sum(more_than_300))
#> # A tibble: 2 x 3
#>   location total_score n_outliers
#>   <chr>          <dbl>      <int>
#> 1 away             927          2
#> 2 home             552          0

Or, if these are your only 3 columns, an equivalent would be:

test %>%
  group_by(location) %>%
  summarize(across(everything(), sum))

In fact, you don't need to make the more_than_300 column - it would suffice to do:

test %>%
  group_by(location) %>%
  summarize(total_score = sum(score),
            n_outliers  = sum(score > 300))
我的影子我的梦 2025-01-29 06:15:05

在基本r中,我们可以尝试汇总这样

> aggregate(.~location,test,sum)
  location score more_than_300
1     away   927             2
2     home   552             0

In base R, we can try aggregate like this

> aggregate(.~location,test,sum)
  location score more_than_300
1     away   927             2
2     home   552             0
戴着白色围巾的女孩 2025-01-29 06:15:05

在基本XTABS中,可以使用每组总和。

xtabs(cbind(score, more_than_300) ~ ., test)
#location score more_than_300
#    away   927             2
#    home   552             0

或通过即时计算异常值并提供所需的列名称。

xtabs(cbind(total_score = score, n_outliers = score > 300) ~ location, test)
#location total_score n_outliers
#    away         927          2
#    home         552          0

另一个选项(也在基础上)将是rowsum

with(test, rowsum(cbind(total_score = score, n_outliers = score > 300), location))
#     total_score n_outliers
#away         927          2
#home         552          0

XTABSrowsum专门用于计算每个组的总和,并且可能在此任务中表现出色。

In base xtabs could be used to sum up per group.

xtabs(cbind(score, more_than_300) ~ ., test)
#location score more_than_300
#    away   927             2
#    home   552             0

Or by calculating the outliers on the fly and giving desired column names.

xtabs(cbind(total_score = score, n_outliers = score > 300) ~ location, test)
#location total_score n_outliers
#    away         927          2
#    home         552          0

Another option, also in base, will be rowsum.

with(test, rowsum(cbind(total_score = score, n_outliers = score > 300), location))
#     total_score n_outliers
#away         927          2
#home         552          0

xtabs and rowsum are specialized in calculating sums per group and might be performant in this task.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文