分组汇总仍然给出每个单独行的结果
我有以下数据:
library(tidyverse)
df <- data.frame(id = c(1,1,1,2,2,2),
x = rep(letters[1:2], each = 3),
y = c(3,4,3,5,6,5),
z = c(7,8,9,10,11,12))
我现在想按 id
汇总数据,根据 y
值获取 z
总和。 y
条件本身取决于 x
的值。
我以为我可以使用下面的代码,但这给了我所有输入 ID 并且没有总结。结果是正确的,但我仍然希望每个 id 一行。
df %>%
group_by(id) %>%
summarize(test = case_when(x == 'a' ~ sum(z[y == 3]),
x == 'b' ~ sum(z[y == 5])))
# A tibble: 6 x 2
# Groups: id [2]
id test
<dbl> <dbl>
1 1 16
2 1 16
3 1 16
4 2 22
5 2 22
6 2 22
以下有效,但我不明白为什么它有效,而上面的代码却无效。
df %>%
group_by(id) %>%
summarize(test = case_when(all(x == 'a') ~ sum(z[y == 3]),
all(x == 'b') ~ sum(z[y == 5])))
# A tibble: 2 x 2
id test
<dbl> <dbl>
1 1 16
2 2 22
另外,有没有更直接的方法来进行总结?
I have the following data:
library(tidyverse)
df <- data.frame(id = c(1,1,1,2,2,2),
x = rep(letters[1:2], each = 3),
y = c(3,4,3,5,6,5),
z = c(7,8,9,10,11,12))
I now want to summarize the data by id
in a way where I get the sum of z
depending on y
values. The y
condition itself depends on the value of x
.
I thought I could use the code below, but this gives me all input ids and doesn‘t summarize. The result is correct, but I still want to have one row per id.
df %>%
group_by(id) %>%
summarize(test = case_when(x == 'a' ~ sum(z[y == 3]),
x == 'b' ~ sum(z[y == 5])))
# A tibble: 6 x 2
# Groups: id [2]
id test
<dbl> <dbl>
1 1 16
2 1 16
3 1 16
4 2 22
5 2 22
6 2 22
The following works, but I don‘t understand why it does and the above code does not.
df %>%
group_by(id) %>%
summarize(test = case_when(all(x == 'a') ~ sum(z[y == 3]),
all(x == 'b') ~ sum(z[y == 5])))
# A tibble: 2 x 2
id test
<dbl> <dbl>
1 1 16
2 2 22
Also, is there a more straigthforward way to do my summarization?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
因为,与
ifelse(test, x, y)
类似的case_when
会返回与test
长度相同的向量。all(x == z)
的长度为 1,因此返回值的长度为 1。Because,
case_when
similar toifelse(test, x, y)
will return a vector of the same length astest
.all(x == z)
has length 1 and so the returned valued is of length 1.