使用单独的类别向量聚合数据框中的列

发布于 2025-01-16 02:32:48 字数 1551 浏览 3 评论 0原文

我有一组观察结果，每个观察结果都包含对一组详细问题是/否的答案。我想要做的是通过一组减少的问题类别来汇总“是”分数。我可以用“for”循环来做到这一点，但我知道一定有更好的方法。下面是一些设置随机数据集的代码，应该澄清我想要做什么。

#
# Initial data. 10 observations with no/yes (0/1) answers to 15 questions. Questions are 
# grouped into 3 categories, 5 questions each. The category is not included in the data set. I want to compute the sums of each category across all 10 observations.
#
# Set up random test data
#
myData<-data.frame(COL1=sample(rep(c(0,1),5),10),
                   COL2=sample(rep(c(0,1),5),10),
                   COL3=sample(rep(c(0,1),5),10),
                   COL4=sample(rep(c(0,1),5),10),
                   COL5=sample(rep(c(0,1),5),10),
                   COL6=sample(rep(c(0,1),5),10),
                   COL7=sample(rep(c(0,1),5),10),
                   COL8=sample(rep(c(0,1),5),10),
                   COL9=sample(rep(c(0,1),5),10),
                   COL10=sample(rep(c(0,1),5),10),
                   COL11=sample(rep(c(0,1),5),10),
                   COL12=sample(rep(c(0,1),5),10),
                   COL13=sample(rep(c(0,1),5),10),
                   COL14=sample(rep(c(0,1),5),10),
                   COL15=sample(rep(c(0,1),5),10))
print(myData)
#
# Allocate storage for category totals   
#
catSums<-data.frame(
  Cat01=rep(NA,10),
  Cat02=rep(NA,10),
  Cat03=rep(NA,10))
# 
# For loop to aggregate sums in each category
#
for (i in 1:10) {
  catSums$Cat01[i]=sum(myData[i,c(1:5)])
  catSums$Cat02[i]=sum(myData[i,c(6:10)])
  catSums$Cat03[i]=sum(myData[i,c(11:15)])
}
print(catSums)

原文

I have a set of observations that each contains yes/no answers to a set of detailed questions. What I want to do is aggregate the "yes" scores by a reduced set of categories for the questions. I can do this with a "for" loop but I know there must be a better way. Below is some code that set ups a random data set that should clarify what I am trying to do.

#
# Initial data. 10 observations with no/yes (0/1) answers to 15 questions. Questions are 
# grouped into 3 categories, 5 questions each. The category is not included in the data set. I want to compute the sums of each category across all 10 observations.
#
# Set up random test data
#
myData<-data.frame(COL1=sample(rep(c(0,1),5),10),
                   COL2=sample(rep(c(0,1),5),10),
                   COL3=sample(rep(c(0,1),5),10),
                   COL4=sample(rep(c(0,1),5),10),
                   COL5=sample(rep(c(0,1),5),10),
                   COL6=sample(rep(c(0,1),5),10),
                   COL7=sample(rep(c(0,1),5),10),
                   COL8=sample(rep(c(0,1),5),10),
                   COL9=sample(rep(c(0,1),5),10),
                   COL10=sample(rep(c(0,1),5),10),
                   COL11=sample(rep(c(0,1),5),10),
                   COL12=sample(rep(c(0,1),5),10),
                   COL13=sample(rep(c(0,1),5),10),
                   COL14=sample(rep(c(0,1),5),10),
                   COL15=sample(rep(c(0,1),5),10))
print(myData)
#
# Allocate storage for category totals   
#
catSums<-data.frame(
  Cat01=rep(NA,10),
  Cat02=rep(NA,10),
  Cat03=rep(NA,10))
# 
# For loop to aggregate sums in each category
#
for (i in 1:10) {
  catSums$Cat01[i]=sum(myData[i,c(1:5)])
  catSums$Cat02[i]=sum(myData[i,c(6:10)])
  catSums$Cat03[i]=sum(myData[i,c(11:15)])
}
print(catSums)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

疏忽 2025-01-23 02:32:48

您可以使用 dplyr：

library(dplyr)
myData %>% 
  summarise(Cat01 = rowSums(across(COL1:COL5)),
            Cat02 = rowSums(across(COL6:COL10)),
            Cat03 = rowSums(across(COL11:COL15)))

这会返回

# Cat01 Cat02 Cat03
# 1      2     3     4
# 2      2     4     1
# 3      2     4     2
# 4      2     4     4
# 5      4     3     2
# 6      2     1     4
# 7      4     3     2
# 8      0     1     3
# 9      3     1     1
# 10     4     1     2

You could use dplyr:

library(dplyr)
myData %>% 
  summarise(Cat01 = rowSums(across(COL1:COL5)),
            Cat02 = rowSums(across(COL6:COL10)),
            Cat03 = rowSums(across(COL11:COL15)))

This returns

# Cat01 Cat02 Cat03
# 1      2     3     4
# 2      2     4     1
# 3      2     4     2
# 4      2     4     4
# 5      4     3     2
# 6      2     1     4
# 7      4     3     2
# 8      0     1     3
# 9      3     1     1
# 10     4     1     2

回复收藏 0 原文

淡写薰衣草的香 2025-01-23 02:32:48

你可以这样做：

setNames(as.data.frame(sapply(split(1:15, rep(1:3, each = 5)), function(x) {
   rowSums(myData[x])})), c('Cat01', 'Cat02', 'Cat03'))
#>    Cat01 Cat02 Cat03
#> 1      4     4     3
#> 2      0     1     1
#> 3      2     2     2
#> 4      3     3     2
#> 5      2     3     2
#> 6      3     2     3
#> 7      1     2     2
#> 8      2     3     3
#> 9      4     4     4
#> 10     4     1     3

You could do:

setNames(as.data.frame(sapply(split(1:15, rep(1:3, each = 5)), function(x) {
   rowSums(myData[x])})), c('Cat01', 'Cat02', 'Cat03'))
#>    Cat01 Cat02 Cat03
#> 1      4     4     3
#> 2      0     1     1
#> 3      2     2     2
#> 4      3     3     2
#> 5      2     3     2
#> 6      3     2     3
#> 7      1     2     2
#> 8      2     3     3
#> 9      4     4     4
#> 10     4     1     3

回复收藏 0 原文

俯瞰星空 2025-01-23 02:32:48

setNames(
  as.data.frame(sapply(c(1,6,11),function(x) rowSums(myData[x:(x+4)]))),
  c("Cat01","Cat02", "Cat03")
)

输出：

   Cat01 Cat02 Cat03
1      2     0     4
2      1     3     2
3      1     2     3
4      3     2     2
5      2     4     2
6      1     2     1
7      4     3     3
8      2     3     2
9      4     5     2
10     5     1     4

setNames(
  as.data.frame(sapply(c(1,6,11),function(x) rowSums(myData[x:(x+4)]))),
  c("Cat01","Cat02", "Cat03")
)

Output:

   Cat01 Cat02 Cat03
1      2     0     4
2      1     3     2
3      1     2     3
4      3     2     2
5      2     4     2
6      1     2     1
7      4     3     3
8      2     3     2
9      4     5     2
10     5     1     4

回复收藏 0 原文

~没有更多了~