通过r组中的组中的不同数据集匹配值

发布于 2025-01-23 02:19:07 字数 1271 浏览 2 评论 0原文

我有以下两个数据集：

df1 <- data.frame(
  "group" = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5), 
  "numbers" = c(55, 75, 60, 55, 75, 60,  55, 75, 60,  55, 75, 60,  55, 75, 60))

df2 <- data.frame(
  "group" = c(1, 1, 2, 2, 2, 3, 3, 4, 5), 
  "P1" = c(55, NA, 60, 55, 75, 75, 55, 55, 60),
  "P2" = c(55, 75, 55, 60, NA, 75, 55, NA, 60),
  "P3" = c(75, 55, 60, 75, NA, 75, 60, 55, 60))

在DF1中，每个组具有相同的三个数字（实际上大约有500个数字）。

我想检查DF1中“数字”中的“数字”中的值是否包含在DF2的P1，P2和P3中。我遇到了两个问题。 1。DF1的数字列中的值可以发生在DF2的不同组中（由DF1和DF2中的组列定义）。 2。数据集具有不同的长度。是否有一种方法可以合并两个数据集并具有以下数据集：

df3 <- data.frame(
  "group"    = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5), 
  "numbers"  = c(55, 75, 60, 55, 75, 60, 55, 75, 60, 55, 75, 60, 55, 75, 60,),
  "P1new"    = c(1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1),
  "P2new"    = c(1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1),
  "P3new"    = c(1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1))

如果df2 $ p1包含在正确组中的df1 $中的值，则p1new（分别为p2new和p3new）包含值1组）。例如，P3在第1组中具有75个值，但在第5组中没有值。因此，在第1组中，P3New将具有1个，在第5组中，P3New中的P3NEW中的P3 New将具有0。这个问题类似于在R中查找不同数据集中的匹配值，但我无法根据我的目标调整代码。因此，我真的很感谢任何帮助。

原文

I have the following two datasets:

df1 <- data.frame(
  "group" = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5), 
  "numbers" = c(55, 75, 60, 55, 75, 60,  55, 75, 60,  55, 75, 60,  55, 75, 60))

df2 <- data.frame(
  "group" = c(1, 1, 2, 2, 2, 3, 3, 4, 5), 
  "P1" = c(55, NA, 60, 55, 75, 75, 55, 55, 60),
  "P2" = c(55, 75, 55, 60, NA, 75, 55, NA, 60),
  "P3" = c(75, 55, 60, 75, NA, 75, 60, 55, 60))

In df1 each group has the same three numbers (in reality there are around 500 numbers).

I want to check whether the values in the column "numbers" in df1 are contained in the columns P1, P2, and P3 of df2. There are two problems I am stuck with. 1. the values in the numbers column of df1 can occur in different groups in df2 (defined by the group column in df1 and df2). 2. the datasets have different lengths. Is there a way to merge both datasets and have the following dataset:

df3 <- data.frame(
  "group"    = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5), 
  "numbers"  = c(55, 75, 60, 55, 75, 60, 55, 75, 60, 55, 75, 60, 55, 75, 60,),
  "P1new"    = c(1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1),
  "P2new"    = c(1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1),
  "P3new"    = c(1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1))

where P1new (P2new and P3new respectively) contain the value 1 if df2$P1 contains the value in df1$numbers within the correct group (as I said numbers can reoccur in different groups). For example, P3 has the value 75 in group 1 but not in group 5. So in group 1 P3new would have a 1 and in group 5 P3new would have a 0.
This question is similar to Find matching values in different datasets by groups in R
but I could not adapt the code according to my objectives. So, I would really appreciate any help.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

拥抱影子 2025-01-30 02:19:07

有趣的问题。这是dplyr函数的方法：

library(dplyr)
df2 %>% 
  group_by(group) %>% 
  summarise(across(P1:P3, ~ list(unique(na.omit(.x))))) %>% 
  inner_join(df1, .) %>% 
  rowwise() %>% 
  mutate(across(P1:P3, ~ +(numbers %in% .x)))

   group numbers    P1    P2    P3
   <dbl>   <dbl> <int> <int> <int>
 1     1      55     1     1     1
 2     1      75     0     1     1
 3     1      60     0     0     0
 4     2      55     1     1     0
 5     2      75     1     0     1
 6     2      60     1     1     1
 7     3      55     1     1     0
 8     3      75     1     1     1
 9     3      60     0     0     1
10     4      55     1     0     1
11     4      75     0     0     0
12     4      60     0     0     0
13     5      55     0     0     0
14     5      75     0     0     0
15     5      60     1     1     1

Interesting question. Here's a way with dplyr functions:

library(dplyr)
df2 %>% 
  group_by(group) %>% 
  summarise(across(P1:P3, ~ list(unique(na.omit(.x))))) %>% 
  inner_join(df1, .) %>% 
  rowwise() %>% 
  mutate(across(P1:P3, ~ +(numbers %in% .x)))

   group numbers    P1    P2    P3
   <dbl>   <dbl> <int> <int> <int>
 1     1      55     1     1     1
 2     1      75     0     1     1
 3     1      60     0     0     0
 4     2      55     1     1     0
 5     2      75     1     0     1
 6     2      60     1     1     1
 7     3      55     1     1     0
 8     3      75     1     1     1
 9     3      60     0     0     1
10     4      55     1     0     1
11     4      75     0     0     0
12     4      60     0     0     0
13     5      55     0     0     0
14     5      75     0     0     0
15     5      60     1     1     1

回复收藏 0 原文

巾帼英雄 2025-01-30 02:19:07

另一个可能的解决方案：

library(tidyverse)

map_dfc(names(df2[-1]), 
        ~ df1 %>%
          group_by(group) %>%
          mutate(!!.x := +(numbers %in% df2[df2$group == cur_group_id(), .x])) %>%
          ungroup %>%
          select(all_of(.x))) %>%
  bind_cols(df1, .)

#>    group numbers P1 P2 P3
#> 1      1      55  1  1  1
#> 2      1      75  0  1  1
#> 3      1      60  0  0  0
#> 4      2      55  1  1  0
#> 5      2      75  1  0  1
#> 6      2      60  1  1  1
#> 7      3      55  1  1  0
#> 8      3      75  1  1  1
#> 9      3      60  0  0  1
#> 10     4      55  1  0  1
#> 11     4      75  0  0  0
#> 12     4      60  0  0  0
#> 13     5      55  0  0  0
#> 14     5      75  0  0  0
#> 15     5      60  1  1  1

或者，没有purrr，是另一种可能性：

library(dplyr)

df1 %>% 
  inner_join(df2) %>% 
  group_by(group) %>% 
  mutate(across(starts_with("P"), ~ +(numbers %in% .x))) %>% 
  ungroup %>% 
  distinct

Another possible solution:

library(tidyverse)

map_dfc(names(df2[-1]), 
        ~ df1 %>%
          group_by(group) %>%
          mutate(!!.x := +(numbers %in% df2[df2$group == cur_group_id(), .x])) %>%
          ungroup %>%
          select(all_of(.x))) %>%
  bind_cols(df1, .)

#>    group numbers P1 P2 P3
#> 1      1      55  1  1  1
#> 2      1      75  0  1  1
#> 3      1      60  0  0  0
#> 4      2      55  1  1  0
#> 5      2      75  1  0  1
#> 6      2      60  1  1  1
#> 7      3      55  1  1  0
#> 8      3      75  1  1  1
#> 9      3      60  0  0  1
#> 10     4      55  1  0  1
#> 11     4      75  0  0  0
#> 12     4      60  0  0  0
#> 13     5      55  0  0  0
#> 14     5      75  0  0  0
#> 15     5      60  1  1  1

Or, without purrr, another possibility:

library(dplyr)

df1 %>% 
  inner_join(df2) %>% 
  group_by(group) %>% 
  mutate(across(starts_with("P"), ~ +(numbers %in% .x))) %>% 
  ungroup %>% 
  distinct

回复收藏 0 原文

~没有更多了~