仅保留“ groups”,其中列中的列中至少有2个元素在r中存在

发布于 2025-02-12 20:23:18 字数 706 浏览 1 评论 0原文

我有一个列表,例如:

The_list=c('SP1','SP2','SP3')

我有一个数据框架,例如

Names Groups 
SP1   G1
SP2   G1
SP3   G1
SP1   G2
SP4   G3
SP5   G4
SP2   G5
SP3   G5
SP6   G5 
SP2   G6
SP7   G6 

,我只想保留groups,其中名称中的至少2个元素the_list中存在;

我应该得到:

Names Groups 
SP1   G1
SP2   G1
SP3   G1
SP2   G5
SP3   G5
SP6   G5 

如果可以有帮助的话,这是DF

structure(list(Names = c("SP1", "SP2", "SP3", "SP1", "SP4", "SP5", 
"SP2", "SP3", "SP6", "SP2", "SP7"), Groups = c("G1", "G1", "G1", 
"G2", "G3", "G4", "G5", "G5", "G5", "G6", "G6")), class = "data.frame", row.names = c(NA, 
-11L))

I have a list such as :

The_list=c('SP1','SP2','SP3')

And I have a dataframe such as

Names Groups 
SP1   G1
SP2   G1
SP3   G1
SP1   G2
SP4   G3
SP5   G4
SP2   G5
SP3   G5
SP6   G5 
SP2   G6
SP7   G6 

And I would like to keep only Groups where at least 2 element in Names are present within The_list;

Here I should get:

Names Groups 
SP1   G1
SP2   G1
SP3   G1
SP2   G5
SP3   G5
SP6   G5 

Here is the df if it can helps

structure(list(Names = c("SP1", "SP2", "SP3", "SP1", "SP4", "SP5", 
"SP2", "SP3", "SP6", "SP2", "SP7"), Groups = c("G1", "G1", "G1", 
"G2", "G3", "G4", "G5", "G5", "G5", "G6", "G6")), class = "data.frame", row.names = c(NA, 
-11L))

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

嗼ふ静 2025-02-19 20:23:18

使用data.table

library(data.table)
setDT(df1)[df1[, .I[sum(The_list %in% Names) >=2], by = Groups]$V1]

-output

    Names Groups
   <char> <char>
1:    SP1     G1
2:    SP2     G1
3:    SP3     G1
4:    SP2     G5
5:    SP3     G5
6:    SP6     G5

Using data.table

library(data.table)
setDT(df1)[df1[, .I[sum(The_list %in% Names) >=2], by = Groups]$V1]

-output

    Names Groups
   <char> <char>
1:    SP1     G1
2:    SP2     G1
3:    SP3     G1
4:    SP2     G5
5:    SP3     G5
6:    SP6     G5
瑕疵 2025-02-19 20:23:18

您可以使用的一种解决方案是

df |> 
  group_by(Groups) |> 
  filter(sum(Names %in% The_list) >= 2)

校正...因为我在%the_list中使用名称%
它不能唯一地识别每个名称,这可能会导致某些组
由于重复名称而显示。

df |> 
  group_by(Groups) |> 
  filter(sum(The_list %in% Names) >= 2)
  Names Groups
  <chr> <chr> 
1 SP1   G1    
2 SP2   G1    
3 SP3   G1    
4 SP2   G5    
5 SP3   G5    
6 SP6   G5  

One solution you can use is

df |> 
  group_by(Groups) |> 
  filter(sum(Names %in% The_list) >= 2)

Correction... because I'm using Names %in% The_list
it does not uniquely identify each Name, which may cause some groups
to be displayed because duplicate names.

df |> 
  group_by(Groups) |> 
  filter(sum(The_list %in% Names) >= 2)
  Names Groups
  <chr> <chr> 
1 SP1   G1    
2 SP2   G1    
3 SP3   G1    
4 SP2   G5    
5 SP3   G5    
6 SP6   G5  
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文