根据r中的最大值（对于多列）选择重复项

发布于 2025-02-02 21:30:26 字数 1206 浏览 3 评论 0 原文

因此，我有一个具有多个重复项的数据集，我想创建一个数据集，该数据集在多个值中选择最大值。因此：例如：

  ID         Value1   Value2 Value3  Gender  Race
   1          45     76      87        M      B   
   1          34     45      95        M      B
   2          67     100     92        F      W
   2          43     70      89        F      W
   3          34     95      80        F      A
   3          22     41      90        F      A
   4          78     25       7        M      W
   4          32     37      13        M      W
   5          56     105     25        M      B
   5          80     59      45        M      B

将成为这个：

  ID         Value1   Value2 Value3  Gender  Race
   1          45     76      95        M      B   
   2          67     100     92        F      W
   3          34     95      90        F      A
   4          78     56      13        M      W
   5          80     105     45        M      B

我有一种与总结命令有关的感觉（尽管有40个值变量，所以我担心为每个变量编写一系列代码）或此处提供的某些解决方案（我不知道如何为我的需求进行修改）：

原文

So I have a dataset with multiple duplicates, and I want to create a dataset that selects for the max value across multiple values. So for example:

  ID         Value1   Value2 Value3  Gender  Race
   1          45     76      87        M      B   
   1          34     45      95        M      B
   2          67     100     92        F      W
   2          43     70      89        F      W
   3          34     95      80        F      A
   3          22     41      90        F      A
   4          78     25       7        M      W
   4          32     37      13        M      W
   5          56     105     25        M      B
   5          80     59      45        M      B

Will become this:

  ID         Value1   Value2 Value3  Gender  Race
   1          45     76      95        M      B   
   2          67     100     92        F      W
   3          34     95      90        F      A
   4          78     56      13        M      W
   5          80     105     45        M      B

I have a feeling it has to do with the summarize command (although there are 40 value variables, so I fear writing a line of code for each variable) or some of the solutions provided here (which I don't know how to quite to modify for my needs): Remove duplicates keeping entry with largest absolute value

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

韵柒 2025-02-09 21:30:26

您可以使用汇总函数如下，

df <- data.frame(ID = c(1,1,2,2,3,3,4,4,5,5) , 
                 Value1 = c(45,34,67,43,34,22,78,32,56,80) , 
                 Value2 = c(76,45,100,70,95,41,25,37,105,59) ,
                 Value3 = c(87,95,92,89,80,90,7,13,25,45) ,
                 Gender = c("M","M","F","F","F","F","M","M","M","M") ,
                 Race = c("B","B","W","W","A","A","W","W","B","B"))

aggregate(df , by = list(df$ID) , max)
#>   Group.1 ID Value1 Value2 Value3 Gender Race
#> 1       1  1     45     76     95      M    B
#> 2       2  2     67    100     92      F    W
#> 3       3  3     34     95     90      F    A
#> 4       4  4     78     37     13      M    W
#> 5       5  5     80    105     45      M    B

^{在2022-05-30上由（v2.0.1）}

You can use aggregate function as follows ,

df <- data.frame(ID = c(1,1,2,2,3,3,4,4,5,5) , 
                 Value1 = c(45,34,67,43,34,22,78,32,56,80) , 
                 Value2 = c(76,45,100,70,95,41,25,37,105,59) ,
                 Value3 = c(87,95,92,89,80,90,7,13,25,45) ,
                 Gender = c("M","M","F","F","F","F","M","M","M","M") ,
                 Race = c("B","B","W","W","A","A","W","W","B","B"))

aggregate(df , by = list(df$ID) , max)
#>   Group.1 ID Value1 Value2 Value3 Gender Race
#> 1       1  1     45     76     95      M    B
#> 2       2  2     67    100     92      F    W
#> 3       3  3     34     95     90      F    A
#> 4       4  4     78     37     13      M    W
#> 5       5  5     80    105     45      M    B

^{Created on 2022-05-30 by the reprex package (v2.0.1)}

回复收藏 0 原文

在风中等你 2025-02-09 21:30:26

您可以通过 ID ，性别和 race 进行分组，并汇总 value 变量以获取其最大值。

library(dplyr)

df %>%
  group_by(ID, Gender, Race) %>%
  summarise(across(starts_with('Value'), max, na.rm = TRUE), .groups = "drop")

#     ID Gender Race  Value1 Value2 Value3
#  <int> <chr>  <chr>  <int>  <int>  <int>
#1     1 M      B         45     76     95
#2     2 F      W         67    100     92
#3     3 F      A         34     95     90
#4     4 M      W         78     37     13
#5     5 M      B         80    105     45

You can group by ID, Gender and Race and summarise the Value variables to get their max.

library(dplyr)

df %>%
  group_by(ID, Gender, Race) %>%
  summarise(across(starts_with('Value'), max, na.rm = TRUE), .groups = "drop")

#     ID Gender Race  Value1 Value2 Value3
#  <int> <chr>  <chr>  <int>  <int>  <int>
#1     1 M      B         45     76     95
#2     2 F      W         67    100     92
#3     3 F      A         34     95     90
#4     4 M      W         78     37     13
#5     5 M      B         80    105     45

回复收藏 0 原文

~没有更多了~