R - 如何基于包含要用于操作的列名称的额外列对每行的不同列进行操作
我是 R 新手。我想计算数据帧每一行的平均值,但为每一行使用不同的列子集。我有两个额外的列,分别为我提供代表“开始”和“结束”的列的名称,我应该用它们分别计算每个平均值。
让我们以这个例子为例,
dframe <- data.frame(a=c("2","3","4", "2"), b=c("1","3","6", "2"), c=c("4","5","6", "3"), d=c("4","2","8", "5"), e=c("a", "c", "a", "b"), f=c("c", "d", "d", "c"))
dframe
它提供了以下数据框:
a b c d e f
1 2 1 4 4 a c
2 3 3 5 2 c d
3 4 6 6 8 a d
4 2 2 3 5 b c
e 列和 f 列代表我用来计算每行平均值的第一列和最后一列。 例如,在第 1 行,将计算包括 a、b、c 列的平均值 ((2+1+4)/3 -> 2.3) 所以我想获得以下输出:
a b c d e f mean
1 2 1 4 4 a c 2.3
2 3 3 5 2 c d 3.5
3 4 6 6 8 a d 6
4 2 2 3 5 b c 2.5
我学会了如何创建索引,然后我想使用 RowMeans,但我找不到正确的参数。
dframe %>%
mutate(e_indice = match(e, colnames(dframe)))%>%
mutate(f_indice = match(f, colnames(dframe)))%>%
mutate(mean = RowMeans(????, na.rm = TRUE))
非常感谢您的帮助
I am new to R. I would like to calculate the mean for each row of a dataframe, but using different subset of columns for each row. I have two extra-columns providing me the names of the column that represent the "start" and the "end" that I should use to calculate each mean, respectively.
Let's take this example
dframe <- data.frame(a=c("2","3","4", "2"), b=c("1","3","6", "2"), c=c("4","5","6", "3"), d=c("4","2","8", "5"), e=c("a", "c", "a", "b"), f=c("c", "d", "d", "c"))
dframe
Which provides the following dataframe:
a b c d e f
1 2 1 4 4 a c
2 3 3 5 2 c d
3 4 6 6 8 a d
4 2 2 3 5 b c
The columns e and f represent the first and last column I use to calculate the mean for each row.
For example, on line 1, the mean would be calculated including column a, b, c ((2+1+4)/3 -> 2.3)
So I would like to obtain the following output:
a b c d e f mean
1 2 1 4 4 a c 2.3
2 3 3 5 2 c d 3.5
3 4 6 6 8 a d 6
4 2 2 3 5 b c 2.5
I learnt how to create the indices, and I want then to use RowMeans, but I cannot find the correct arguments.
dframe %>%
mutate(e_indice = match(e, colnames(dframe)))%>%
mutate(f_indice = match(f, colnames(dframe)))%>%
mutate(mean = RowMeans(????, na.rm = TRUE))
Thanks a lot for your help
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

发布评论
评论(4)
我会定义一个辅助函数,让您可以切片所需的索引
来自矩阵。
rowSlice <- function(x, start, stop) {
replace(x, col(x) < start | col(x) > stop, NA)
}
rowSlice(matrix(1, 4, 4), c(1, 3, 1, 2), c(3, 4, 4, 3))
#> [,1] [,2] [,3] [,4]
#> [1,] 1 1 1 NA
#> [2,] NA NA 1 1
#> [3,] 1 1 1 1
#> [4,] NA 1 1 NA
然后使用across()
选择相关列,对它们进行切片,
并采用rowMeans()
。
library(dplyr)
dframe <- data.frame(
a = c(2, 3, 4, 2),
b = c(1, 3, 6, 2),
c = c(4, 5, 6, 3),
d = c(4, 2, 8, 5),
e = c("a", "c", "a", "b"),
f = c("c", "d", "d", "c")
)
dframe %>%
mutate(ei = match(e, colnames(dframe))) %>%
mutate(fi = match(f, colnames(dframe))) %>%
mutate(
mean = across(a:d) %>%
rowSlice(ei, fi) %>%
rowMeans(na.rm = TRUE)
)
#> a b c d e f ei fi mean
#> 1 2 1 4 4 a c 1 3 2.333333
#> 2 3 3 5 2 c d 3 4 3.500000
#> 3 4 6 6 8 a d 1 4 6.000000
#> 4 2 2 3 5 b c 2 3 2.500000
基本 R 解决方案。首先,将列设置为数字。然后创建要应用平均值的列的列表。然后对选定的列应用平均值。
s <- mapply(seq, match(dframe$e, colnames(dframe)), match(dframe$f, colnames(dframe)))
dframe$mean <- lapply(seq(nrow(dframe)), function(x) rowMeans(dframe[x, s[[x]]]))
a b c d e f mean
1 2 1 4 4 a c 2.333333
2 3 3 5 2 c d 3.5
3 4 6 6 8 a d 6
4 2 2 3 5 b c 2.5
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
一个 dplyr 选项可能是:
One
dplyr
option could be: