R - 如何基于包含要用于操作的列名称的额外列对每行的不同列进行操作

发布于 01-11 00:14 字数 898 浏览 1 评论 0原文

我是 R 新手。我想计算数据帧每一行的平均值，但为每一行使用不同的列子集。我有两个额外的列，分别为我提供代表“开始”和“结束”的列的名称，我应该用它们分别计算每个平均值。

让我们以这个例子为例，

dframe <- data.frame(a=c("2","3","4", "2"), b=c("1","3","6", "2"), c=c("4","5","6", "3"), d=c("4","2","8", "5"), e=c("a", "c", "a", "b"), f=c("c", "d", "d", "c"))
dframe

它提供了以下数据框：

  a b c d e f
1 2 1 4 4 a c
2 3 3 5 2 c d
3 4 6 6 8 a d
4 2 2 3 5 b c

e 列和 f 列代表我用来计算每行平均值的第一列和最后一列。例如，在第 1 行，将计算包括 a、b、c 列的平均值 ((2+1+4)/3 -> 2.3) 所以我想获得以下输出：

  a b c d e f mean
1 2 1 4 4 a c  2.3
2 3 3 5 2 c d  3.5
3 4 6 6 8 a d    6
4 2 2 3 5 b c  2.5

我学会了如何创建索引，然后我想使用 RowMeans，但我找不到正确的参数。

dframe %>%
  mutate(e_indice = match(e, colnames(dframe)))%>%
  mutate(f_indice = match(f, colnames(dframe)))%>%
  mutate(mean = RowMeans(????, na.rm = TRUE))

非常感谢您的帮助

原文

I am new to R. I would like to calculate the mean for each row of a dataframe, but using different subset of columns for each row. I have two extra-columns providing me the names of the column that represent the "start" and the "end" that I should use to calculate each mean, respectively.

Let's take this example

dframe <- data.frame(a=c("2","3","4", "2"), b=c("1","3","6", "2"), c=c("4","5","6", "3"), d=c("4","2","8", "5"), e=c("a", "c", "a", "b"), f=c("c", "d", "d", "c"))
dframe

Which provides the following dataframe:

  a b c d e f
1 2 1 4 4 a c
2 3 3 5 2 c d
3 4 6 6 8 a d
4 2 2 3 5 b c

The columns e and f represent the first and last column I use to calculate the mean for each row.
For example, on line 1, the mean would be calculated including column a, b, c ((2+1+4)/3 -> 2.3)
So I would like to obtain the following output:

  a b c d e f mean
1 2 1 4 4 a c  2.3
2 3 3 5 2 c d  3.5
3 4 6 6 8 a d    6
4 2 2 3 5 b c  2.5

I learnt how to create the indices, and I want then to use RowMeans, but I cannot find the correct arguments.

dframe %>%
  mutate(e_indice = match(e, colnames(dframe)))%>%
  mutate(f_indice = match(f, colnames(dframe)))%>%
  mutate(mean = RowMeans(????, na.rm = TRUE))

Thanks a lot for your help

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

污味仙女2025-01-18 00:14:45

一个 dplyr 选项可能是：

dframe %>%
    rowwise() %>%
    mutate(mean = rowMeans(cur_data()[match(e, names(.)):match(f, names(.))]))

      a     b     c     d e     f      mean
  <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl>
1     2     1     4     4 a     c      2.33
2     3     3     5     2 c     d      3.5 
3     4     6     6     8 a     d      6   
4     2     2     3     5 b     c      2.5

One dplyr option could be:

dframe %>%
    rowwise() %>%
    mutate(mean = rowMeans(cur_data()[match(e, names(.)):match(f, names(.))]))

      a     b     c     d e     f      mean
  <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl>
1     2     1     4     4 a     c      2.33
2     3     3     5     2 c     d      3.5 
3     4     6     6     8 a     d      6   
4     2     2     3     5 b     c      2.5

回复收藏 0 原文

铁轨上的流浪者2025-01-18 00:14:45

我会定义一个辅助函数，让您可以切片所需的索引
来自矩阵。

rowSlice <- function(x, start, stop) {
  replace(x, col(x) < start | col(x) > stop, NA)
}

rowSlice(matrix(1, 4, 4), c(1, 3, 1, 2), c(3, 4, 4, 3))
#>      [,1] [,2] [,3] [,4]
#> [1,]    1    1    1   NA
#> [2,]   NA   NA    1    1
#> [3,]    1    1    1    1
#> [4,]   NA    1    1   NA

然后使用across()选择相关列，对它们进行切片，
并采用rowMeans()。

library(dplyr)

dframe <- data.frame(
  a = c(2, 3, 4, 2),
  b = c(1, 3, 6, 2),
  c = c(4, 5, 6, 3),
  d = c(4, 2, 8, 5),
  e = c("a", "c", "a", "b"),
  f = c("c", "d", "d", "c")
)

dframe %>%
  mutate(ei = match(e, colnames(dframe))) %>%
  mutate(fi = match(f, colnames(dframe))) %>% 
  mutate(
    mean = across(a:d) %>%
      rowSlice(ei, fi) %>%
      rowMeans(na.rm = TRUE)
  )
#>   a b c d e f ei fi     mean
#> 1 2 1 4 4 a c  1  3 2.333333
#> 2 3 3 5 2 c d  3  4 3.500000
#> 3 4 6 6 8 a d  1  4 6.000000
#> 4 2 2 3 5 b c  2  3 2.500000

I would define a helper function that lets you slice the indices you want
from a matrix.

rowSlice <- function(x, start, stop) {
  replace(x, col(x) < start | col(x) > stop, NA)
}

rowSlice(matrix(1, 4, 4), c(1, 3, 1, 2), c(3, 4, 4, 3))
#>      [,1] [,2] [,3] [,4]
#> [1,]    1    1    1   NA
#> [2,]   NA   NA    1    1
#> [3,]    1    1    1    1
#> [4,]   NA    1    1   NA

Then use across() to select the relvant columns, slice them,
and take the rowMeans().

library(dplyr)

dframe <- data.frame(
  a = c(2, 3, 4, 2),
  b = c(1, 3, 6, 2),
  c = c(4, 5, 6, 3),
  d = c(4, 2, 8, 5),
  e = c("a", "c", "a", "b"),
  f = c("c", "d", "d", "c")
)

dframe %>%
  mutate(ei = match(e, colnames(dframe))) %>%
  mutate(fi = match(f, colnames(dframe))) %>% 
  mutate(
    mean = across(a:d) %>%
      rowSlice(ei, fi) %>%
      rowMeans(na.rm = TRUE)
  )
#>   a b c d e f ei fi     mean
#> 1 2 1 4 4 a c  1  3 2.333333
#> 2 3 3 5 2 c d  3  4 3.500000
#> 3 4 6 6 8 a d  1  4 6.000000
#> 4 2 2 3 5 b c  2  3 2.500000

回复收藏 0 原文

深爱成瘾2025-01-18 00:14:45

基本 R 解决方案。首先，将列设置为数字。然后创建要应用平均值的列的列表。然后对选定的列应用平均值。

s <- mapply(seq, match(dframe$e, colnames(dframe)), match(dframe$f, colnames(dframe)))
dframe$mean <- lapply(seq(nrow(dframe)), function(x) rowMeans(dframe[x, s[[x]]]))

  a b c d e f     mean
1 2 1 4 4 a c 2.333333
2 3 3 5 2 c d      3.5
3 4 6 6 8 a d        6
4 2 2 3 5 b c      2.5

A base R solution. First, set columns to numeric. Then create a list of the columns on which to apply the mean. Then apply mean on selected columns.

s <- mapply(seq, match(dframe$e, colnames(dframe)), match(dframe$f, colnames(dframe)))
dframe$mean <- lapply(seq(nrow(dframe)), function(x) rowMeans(dframe[x, s[[x]]]))

  a b c d e f     mean
1 2 1 4 4 a c 2.333333
2 3 3 5 2 c d      3.5
3 4 6 6 8 a d        6
4 2 2 3 5 b c      2.5

回复收藏 0 原文

你的笑2025-01-18 00:14:45

使用 apply 的基本 R 方法

dframe$mean <- apply(dframe, 1, function(x) 
  mean(as.numeric(x[which(names(x) == x["e"]) : which(names(x) == x["f"])])))

dframe
  a b c d e f     mean
1 2 1 4 4 a c 2.333333
2 3 3 5 2 c d 3.500000
3 4 6 6 8 a d 6.000000
4 2 2 3 5 b c 2.500000

A base R approach using apply

dframe$mean <- apply(dframe, 1, function(x) 
  mean(as.numeric(x[which(names(x) == x["e"]) : which(names(x) == x["f"])])))

dframe
  a b c d e f     mean
1 2 1 4 4 a c 2.333333
2 3 3 5 2 c d 3.500000
3 4 6 6 8 a d 6.000000
4 2 2 3 5 b c 2.500000

回复收藏 0 原文

~没有更多了~