使用矩阵数组中的行索引值对 R 中数据帧中的列下的行值求和 - 可重现的示例

发布于 2025-01-16 09:34:50 字数 1582 浏览 1 评论 0原文

我有一个数据框（df1），其中包含一列值。我还有一个矩阵（ma1），其中包含两列，它们引用 df1 中的行索引（即，ma1 的第 1 行、第 1 列是 4，第 2 列是 6 - 这些引用 df1 的第 4 行和第 6 行）

我需要能够使用 ma1 中的行索引对每个窗口的 df1 中的行值求和。所以我需要输出是第 4-6 行、第 9 到 12 行、第 15 到 19 行等的总和 - 与 ma1 中的索引值相对应。

我已阅读有关 rowSums 的内容，但不确定如何在此示例中完全实现此

示例：ma1 第 1 行是 4 和 6，因此我需要输出为 df1 的第 4、5、6 行的总和。即 3.37 + 0 + 1.19。 ma1 的第 2 行是 9 和 12，所以我需要输出是 df1 的第 9,10,11,12 行等的总和

df1 <- structure(list(CO2 = c(3.37, 0, 1.19, 0.889999999999986, 5.88999999999999, 
    0.169999999999959, 3.92000000000002, 1.46000000000004, 1.23000000000002, 
    2.60000000000002, 1.39999999999998, 0, 4.35999999999996, 0.649999999999977, 
    0.149999999999977, 2.08999999999997, 4.23999999999995, 5.69, 
    0, 3.38, 1.95000000000005, 3.16999999999996, 2.82999999999998, 
    0, 1.69, 1.36000000000001, 0.669999999999959, 0.54000000000002, 
    0.529999999999973, 0.95999999999998, 0.600000000000023, 0.850000000000023, 
    0, 0.00999999999999091, 1.77999999999997, 1.98000000000002, 1.63, 
    2.74000000000001, 2.56, 3.50999999999999, 0, 0, 3.37, 0, 0.630000000000052, 
    0, 0.270000000000039, 0.769999999999982, 0.75, 1.25999999999999, 
    0, 0.689999999999998, 1.12, 0.210000000000036, 2.66000000000003, 
    3.14000000000004, 2.24000000000001, 0.620000000000005, 0.0900000000000318, 
    0)), row.names = c(NA, -60L), class = c("tbl_df", "tbl", "data.frame"
    ))

    ma1 <- structure(c(4, 6, 9, 12, 15, 19, 33, 37, 41, 54, 6, 9, 12, 15, 
    19, 24, 37, 41, 44, 60), .Dim = c(10L, 2L), .Dimnames = list(
        NULL, c("co2_start", "co2_end")))

原文

I have a dataframe (df1) the contains a column of values. I also have a matrix (ma1) that contains two columns which refer to row indices in df1 (i.e., row 1, col 1 of ma1 is 4 and col 2 is 6 - these refer to row 4 and 6 of df1)

I need to be able to use the row indices in ma1 to sum the values of rows in df1 for each window. So i need the output to be sum of row 4-6, row 9 to 12, row 15 to 19 etc. - which correspond with the indices values in ma1.

I have read about rowSums but unsure how to full implement this in this example

Example: ma1 row 1 is 4 and 6, so I need output to be sum of rows 4,5,6 of df1. i.e., 3.37 + 0 + 1.19. Row 2 of ma1 is 9 and 12, so I need output to be sum of rows 9,10,11,12 of df1 etc etc

df1 <- structure(list(CO2 = c(3.37, 0, 1.19, 0.889999999999986, 5.88999999999999, 
    0.169999999999959, 3.92000000000002, 1.46000000000004, 1.23000000000002, 
    2.60000000000002, 1.39999999999998, 0, 4.35999999999996, 0.649999999999977, 
    0.149999999999977, 2.08999999999997, 4.23999999999995, 5.69, 
    0, 3.38, 1.95000000000005, 3.16999999999996, 2.82999999999998, 
    0, 1.69, 1.36000000000001, 0.669999999999959, 0.54000000000002, 
    0.529999999999973, 0.95999999999998, 0.600000000000023, 0.850000000000023, 
    0, 0.00999999999999091, 1.77999999999997, 1.98000000000002, 1.63, 
    2.74000000000001, 2.56, 3.50999999999999, 0, 0, 3.37, 0, 0.630000000000052, 
    0, 0.270000000000039, 0.769999999999982, 0.75, 1.25999999999999, 
    0, 0.689999999999998, 1.12, 0.210000000000036, 2.66000000000003, 
    3.14000000000004, 2.24000000000001, 0.620000000000005, 0.0900000000000318, 
    0)), row.names = c(NA, -60L), class = c("tbl_df", "tbl", "data.frame"
    ))

    ma1 <- structure(c(4, 6, 9, 12, 15, 19, 33, 37, 41, 54, 6, 9, 12, 15, 
    19, 24, 37, 41, 44, 60), .Dim = c(10L, 2L), .Dimnames = list(
        NULL, c("co2_start", "co2_end")))

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

多谢你的绝情让我学会死心 2025-01-23 09:34:50

您可以使用 apply 遍历 ma1 的行，每次使用 Between(row_number(),....) 进行过滤，并包装在 colSums() 中。这里，x 代表 ma1 的（每个）行向量

cbind(
  ma1,
  apply(ma1,1, function(x) colSums(filter(df1,between(row_number(),x[1],x[2]))))
)

输出：

      co2_start co2_end      
 [1,]         4       6  6.95
 [2,]         6       9  6.78
 [3,]         9      12  5.23
 [4,]        12      15  5.16
 [5,]        15      19 12.17
 [6,]        19      24 11.33
 [7,]        33      37  5.40
 [8,]        37      41 10.44
 [9,]        41      44  3.37
[10,]        54      60  8.96

此 data.table 方法将比

library(data.table)
ma1 = data.table(ma1)
ma1[, colSums(df1[co2_start:co2_end,]), by=1:nrow(ma1)]

此基本 R 方法更快.. tidyverse 的过滤器/之间会减慢速度

cbind(
  ma1,
  apply(ma1,1, function(x) colSums(df1[x[1]:x[2],]))
)

You can use apply to go over the rows of ma1, each time, using between(row_number(),....) to filter, and wrap in colSums(). Here, x, represents (each) row-vector of ma1

cbind(
  ma1,
  apply(ma1,1, function(x) colSums(filter(df1,between(row_number(),x[1],x[2]))))
)

Output:

      co2_start co2_end      
 [1,]         4       6  6.95
 [2,]         6       9  6.78
 [3,]         9      12  5.23
 [4,]        12      15  5.16
 [5,]        15      19 12.17
 [6,]        19      24 11.33
 [7,]        33      37  5.40
 [8,]        37      41 10.44
 [9,]        41      44  3.37
[10,]        54      60  8.96

This data.table approach will be faster

library(data.table)
ma1 = data.table(ma1)
ma1[, colSums(df1[co2_start:co2_end,]), by=1:nrow(ma1)]

as would this base R approach.. It is the filter/between from tidyverse that slows things down

cbind(
  ma1,
  apply(ma1,1, function(x) colSums(df1[x[1]:x[2],]))
)

回复收藏 0 原文

前事休说 2025-01-23 09:34:50

此方法仅返回您实际想要的索引的总和，而不是每个连续的成对索引。

ma1_sum = list()
ma1_indexes = list()
df1_values_summed = list()
for(i in 1:length(ma1)){
  j = i
  if(i < length(ma1)){
  if(i %% 2 == 0){i = i + 1}
  ma1_indexes[[j]] = toString(rep(ma1[i]:ma1[i+1]))
  df1_values_summed[[j]] = toString(df1$CO2[rep(ma1[i]:ma1[i+1])])
  ma1_sum[[j]] = sum(df1$CO2[rep(ma1[i]:ma1[i+1])]) 
  } else return(NULL)
}

data.frame(result = unlist(ma1_sum[seq(1, length(ma1_sum), 2)]),
           indexes = unlist(ma1_indexes[seq(1, length(ma1_sum), 2)]),
           values_summed = unlist(df1_values_summed[seq(1, length(df1_values_summed), 2)])
)

输出：

result indexes                                                            values_summed                                                                                   
    <dbl> <chr>                                                              <chr>                                                                                           
 1   6.95 4, 5, 6                                                            0.889999999999986, 5.88999999999999, 0.169999999999959                                          
 2   5.23 9, 10, 11, 12                                                      1.23000000000002, 2.60000000000002, 1.39999999999998, 0                                         
 3  12.2  15, 16, 17, 18, 19                                                 0.149999999999977, 2.08999999999997, 4.23999999999995, 5.69, 0                                  
 4   5.40 33, 34, 35, 36, 37                                                 0, 0.00999999999999091, 1.77999999999997, 1.98000000000002, 1.63                                
 5   9.07 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54             0, 0, 3.37, 0, 0.630000000000052, 0, 0.270000000000039, 0.769999999999982, 0.75, 1.259999999999…
 6   6.78 6, 7, 8, 9                                                         0.169999999999959, 3.92000000000002, 1.46000000000004, 1.23000000000002                         
 7   5.16 12, 13, 14, 15                                                     0, 4.35999999999996, 0.649999999999977, 0.149999999999977                                       
 8  11.3  19, 20, 21, 22, 23, 24                                             0, 3.38, 1.95000000000005, 3.16999999999996, 2.82999999999998, 0                                
 9  10.4  37, 38, 39, 40, 41                                                 1.63, 2.74000000000001, 2.56, 3.50999999999999, 0                                               
10  14.5  44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60 0, 0.630000000000052, 0, 0.270000000000039, 0.769999999999982, 0.75, 1.25999999999999, 0, 0.689…

This approach returns only the sum of the indexes you actually want rather than every sequential pairwise index.

ma1_sum = list()
ma1_indexes = list()
df1_values_summed = list()
for(i in 1:length(ma1)){
  j = i
  if(i < length(ma1)){
  if(i %% 2 == 0){i = i + 1}
  ma1_indexes[[j]] = toString(rep(ma1[i]:ma1[i+1]))
  df1_values_summed[[j]] = toString(df1$CO2[rep(ma1[i]:ma1[i+1])])
  ma1_sum[[j]] = sum(df1$CO2[rep(ma1[i]:ma1[i+1])]) 
  } else return(NULL)
}

data.frame(result = unlist(ma1_sum[seq(1, length(ma1_sum), 2)]),
           indexes = unlist(ma1_indexes[seq(1, length(ma1_sum), 2)]),
           values_summed = unlist(df1_values_summed[seq(1, length(df1_values_summed), 2)])
)

Output:

result indexes                                                            values_summed                                                                                   
    <dbl> <chr>                                                              <chr>                                                                                           
 1   6.95 4, 5, 6                                                            0.889999999999986, 5.88999999999999, 0.169999999999959                                          
 2   5.23 9, 10, 11, 12                                                      1.23000000000002, 2.60000000000002, 1.39999999999998, 0                                         
 3  12.2  15, 16, 17, 18, 19                                                 0.149999999999977, 2.08999999999997, 4.23999999999995, 5.69, 0                                  
 4   5.40 33, 34, 35, 36, 37                                                 0, 0.00999999999999091, 1.77999999999997, 1.98000000000002, 1.63                                
 5   9.07 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54             0, 0, 3.37, 0, 0.630000000000052, 0, 0.270000000000039, 0.769999999999982, 0.75, 1.259999999999…
 6   6.78 6, 7, 8, 9                                                         0.169999999999959, 3.92000000000002, 1.46000000000004, 1.23000000000002                         
 7   5.16 12, 13, 14, 15                                                     0, 4.35999999999996, 0.649999999999977, 0.149999999999977                                       
 8  11.3  19, 20, 21, 22, 23, 24                                             0, 3.38, 1.95000000000005, 3.16999999999996, 2.82999999999998, 0                                
 9  10.4  37, 38, 39, 40, 41                                                 1.63, 2.74000000000001, 2.56, 3.50999999999999, 0                                               
10  14.5  44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60 0, 0.630000000000052, 0, 0.270000000000039, 0.769999999999982, 0.75, 1.25999999999999, 0, 0.689…

回复收藏 0 原文

~没有更多了~