根据另一列中的值聚合一列中的数据

发布于 2024-12-06 08:51:57 字数 551 浏览 1 评论 0原文

我知道有一个简单的方法可以做到这一点......但是，我无法弄清楚。

我的 R 脚本中有一个数据框，如下所示：

A      B    C
1.2    4    8
2.3    4    9
2.3    6    0
1.2    3    3
3.4    2    1 
1.2    5    1

请注意，A、B 和 C 是列名称。我正在尝试获取这样的变量：

sum1 <- [the sum of all B values such that A is 1.2]
num1 <- [the number of times A is 1.2]

有什么简单的方法可以做到这一点吗？我基本上希望得到一个如下所示的数据框：

    A     num     totalB
   1.2    3       12
   etc    etc     etc

其中“num”是特定 A 值出现的次数，“totalB”是给定 A 值的 B 值的总和。

原文

I know there is an easy way to do this...but, I can't figure it out.

I have a dataframe in my R script that looks something like this:

A      B    C
1.2    4    8
2.3    4    9
2.3    6    0
1.2    3    3
3.4    2    1 
1.2    5    1

Note that A, B, and C are column names. And I'm trying to get variables like this:

sum1 <- [the sum of all B values such that A is 1.2]
num1 <- [the number of times A is 1.2]

Any easy way to do this?
I basically want to end up with a data frame that looks like this:

    A     num     totalB
   1.2    3       12
   etc    etc     etc

Where "num" is the number of times that particular A value appeared, and "totalB" is the sum of the B values given the A value.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

蓝海似她心 2024-12-13 08:51:57

我将使用aggregate来获取两个聚合，然后将它们合并到一个数据帧中：

> df
    A B C
1 1.2 4 8
2 2.3 4 9
3 2.3 6 0
4 1.2 3 3
5 3.4 2 1
6 1.2 5 1

> num <- aggregate(B~A,df,length)
> names(num)[2] <- 'num'

> totalB <- aggregate(B~A,df,sum)
> names(totalB)[2] <- 'totalB'

> merge(num,totalB)
    A num totalB
1 1.2   3     12
2 2.3   2     10
3 3.4   1      2

I'd use aggregate to get the two aggregates and then merge them into a single data frame:

> df
    A B C
1 1.2 4 8
2 2.3 4 9
3 2.3 6 0
4 1.2 3 3
5 3.4 2 1
6 1.2 5 1

> num <- aggregate(B~A,df,length)
> names(num)[2] <- 'num'

> totalB <- aggregate(B~A,df,sum)
> names(totalB)[2] <- 'totalB'

> merge(num,totalB)
    A num totalB
1 1.2   3     12
2 2.3   2     10
3 3.4   1      2

回复收藏 0 原文

听你说爱我 2024-12-13 08:51:57

在 dplyr 中：

library(tidyverse)
A <- c(1.2, 2.3, 2.3, 1.2, 3.4, 1.2)
B <- c(4, 4, 6, 3, 2, 5)
C <- c(8, 9, 0, 3, 1, 1)

df <- data_frame(A, B, C)

df %>%
    group_by(A) %>% 
    summarise(num = n(),
              totalB = sum(B))

In dplyr:

library(tidyverse)
A <- c(1.2, 2.3, 2.3, 1.2, 3.4, 1.2)
B <- c(4, 4, 6, 3, 2, 5)
C <- c(8, 9, 0, 3, 1, 1)

df <- data_frame(A, B, C)

df %>%
    group_by(A) %>% 
    summarise(num = n(),
              totalB = sum(B))

回复收藏 0 原文

尽揽少女心 2024-12-13 08:51:57

这是使用 plyr 包的解决方案

plyr::ddply(df, .(A), summarize, num = length(A), totalB = sum(B))

Here is a solution using the plyr package

plyr::ddply(df, .(A), summarize, num = length(A), totalB = sum(B))

回复收藏 0 原文

谎言 2024-12-13 08:51:57

这是一个使用 data.table 提高内存和时间效率的解决方案

library(data.table)
DT <- as.data.table(df)
DT[, list(totalB = sum(B), num = .N), by = A]

仅对 C==1 的行进行子集化（根据 @aix 答案的评论）

DT[C==1, list(totalB = sum(B), num = .N), by = A]

Here is a solution using data.table for memory and time efficiency

library(data.table)
DT <- as.data.table(df)
DT[, list(totalB = sum(B), num = .N), by = A]

To subset only rows where C==1 (as per the comment to @aix answer)

DT[C==1, list(totalB = sum(B), num = .N), by = A]

回复收藏 0 原文

~没有更多了~

关于作者

眼角的笑意。

暂无简介

0 文章

0 评论

24 人气

关注发私信

初遇

文章 0 评论 0

关注

听闻余生

文章 0 评论 0

关注

Z_dy

文章 0 评论 0

关注

左岸枫

文章 0 评论 0

关注

1848719402

文章 0 评论 0

关注

婷

文章 0 评论 0

友情链接

文江博客

根据另一列中的值聚合一列中的数据

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

初遇

听闻余生

Z_dy

左岸枫

1848719402

婷

友情链接

根据另一列中的值聚合一列中的数据

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

初遇

听闻余生

Z_dy

左岸枫

1848719402

婷

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。