找到最常见的价值，但是当有领带时，请选择“整理”

发布于 2025-02-12 15:13:19 字数 510 浏览 0 评论 0原文

因此，假设我有一些这样的数据：

ID  value  date
001     A  2015-12-06
001     A  2015-12-07
001     A  2015-12-08
002     B  2015-12-09
002     C  2015-12-10
003     A  2015-12-11
003     B  2015-12-12
002     B  2015-12-13
004     D  2015-12-13
004     R  2015-12-13

我想找到每个ID最常出现的value。但是，当有领带时，请采取最新日期的价值。

预期输出：

ID  value
001     A
002     B
003     B
004     R

在004的情况下，您可能会注意到，在TIE期间有相同的日期和相同的ID。在这种情况下，您可以使用较低的排名。

原文

So let's say I have some data like this:

ID  value  date
001     A  2015-12-06
001     A  2015-12-07
001     A  2015-12-08
002     B  2015-12-09
002     C  2015-12-10
003     A  2015-12-11
003     B  2015-12-12
002     B  2015-12-13
004     D  2015-12-13
004     R  2015-12-13

I want to find the value that appears most frequently for each ID. But when there is a tie, take the most recent date's value.

Expected Output:

ID  value
001     A
002     B
003     B
004     R

You may notice in the case of 004 there is the same date AND same ID during a tie. In this case, you can use the lower row entry.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

短暂陪伴 2025-02-19 15:13:19

您可以使用以下代码：

library(dplyr)
df %>%
  group_by(ID) %>%
  mutate(n = n()) %>%
  filter(date == max(date)) %>%
  summarise(value = value[1])
#> # A tibble: 4 × 2
#>      ID value
#>   <int> <chr>
#> 1     1 A    
#> 2     2 B    
#> 3     3 B    
#> 4     4 D

^{在2022-07-02创建的 reprex package （v2（v2）。 0.1）}

You can use the following code:

library(dplyr)
df %>%
  group_by(ID) %>%
  mutate(n = n()) %>%
  filter(date == max(date)) %>%
  summarise(value = value[1])
#> # A tibble: 4 × 2
#>      ID value
#>   <int> <chr>
#> 1     1 A    
#> 2     2 B    
#> 3     3 B    
#> 4     4 D

^{Created on 2022-07-02 by the reprex package (v2.0.1)}

回复收藏 0 原文

若水般的淡然安静女子 2025-02-19 15:13:19

更新，澄清后。请参阅评论：这是给出预期输出的版本：

library(dplyr)

df %>% 
  count(ID, value, date) %>% 
  group_by(ID) %>% 
  filter(date == max(date) & row_number() >1) %>% 
  dplyr::select(-n, -date)

     ID value
  <int> <chr>
1     1 A    
2     2 B    
3     3 B    
4     4 R

第一个答案：
Note Group 0004有联系，没有最新日期，因此两个值都保存在数据框中：

library(dplyr)

df %>% 
  count(ID, value, date) %>% 
  group_by(ID) %>% 
  filter(date == max(date)) %>% 
  dplyr::select(-n)

  ID    value date      
  <chr> <chr> <chr>     
1 0001  A     2015-12-08
2 0002  B     2015-12-13
3 0003  B     2015-12-12
4 0004  D     2015-12-13
5 0004  R     2015-12-13

Update, after clarification OP. See comments: Here is the version that gives the expected output:

library(dplyr)

df %>% 
  count(ID, value, date) %>% 
  group_by(ID) %>% 
  filter(date == max(date) & row_number() >1) %>% 
  dplyr::select(-n, -date)

     ID value
  <int> <chr>
1     1 A    
2     2 B    
3     3 B    
4     4 R

First answer:
Note group 0004 has ties and no most recent date, therefore both values are kept in the dataframe:

library(dplyr)

df %>% 
  count(ID, value, date) %>% 
  group_by(ID) %>% 
  filter(date == max(date)) %>% 
  dplyr::select(-n)

  ID    value date      
  <chr> <chr> <chr>     
1 0001  A     2015-12-08
2 0002  B     2015-12-13
3 0003  B     2015-12-12
4 0004  D     2015-12-13
5 0004  R     2015-12-13

回复收藏 0 原文

~没有更多了~