如何返回匹配列的行值' id'在DF1和DF2中,但不列文本'并将NA返回到列中的不匹配'?

发布于 2025-02-10 22:21:56 字数 3753 浏览 1 评论 0原文

以下是我的两个数据框,DF1和DF2

df1 <- data.frame(id=c("632592651","633322173","634703802","634927873","635812953","636004739","636101211","636157799","636263106","636752420"),text=c("asdf","cat","dog","mouse","elephant","goose","rat","mice","kitty","kitten"),response=c("y","y","y","n","n","y","y","n","n","y"))

id     text response
1  632592651     asdf        y
2  633322173      cat        y
3  634703802      dog        y
4  634927873    mouse        n
5  635812953 elephant        n
6  636004739    goose        y
7  636101211      rat        y
8  636157799     mice        n
9  636263106    kitty        n
10 636752420   kitten        y

df2 <- data.frame(id=c("632592651","633322173","634703802","634927873","635812953","636004739","636101211","636157799","636263106","636752420","636809222","2004722036","2004894388","2005045755","2005535472","2005630542","2005788781","2005809679","2005838317","2005866692"),
                  text=c("asdf_xyz","cat","dog","mouse","elephant","goose","rat","mice","kitty","kitten","tiger_xyz","lion","leopard","ostrich","kangaroo","platypus","fish","reptile","mammals","amphibians_xyz"),
                  volume=c("1234","432","324","333","2223","412346","7456","3456","2345","2345","6","345","23","2","4778","234","8675","3459","8","9"))

 id           text volume
1   632592651       asdf_xyz   1234
2   633322173            cat    432
3   634703802            dog    324
4   634927873          mouse    333
5   635812953       elephant   2223
6   636004739          goose 412346
7   636101211            rat   7456
8   636157799           mice   3456
9   636263106          kitty   2345
10  636752420         kitten   2345
11  636809222      tiger_xyz      6
12 2004722036           lion    345
13 2004894388        leopard     23
14 2005045755        ostrich      2
15 2005535472       kangaroo   4778
16 2005630542       platypus    234
17 2005788781           fish   8675
18 2005809679        reptile   3459
19 2005838317        mammals      8
20 2005866692 amphibians_xyz      9

,我想创建一个dataframe DF3,返回DF1和DF2中匹配列“ ID”和“ text”的行值,但返回NA中的Na返回到

所需的 列“文本”中的不匹配中DF3的输出:

 id   text response volume
1  632592651       NA        y   1234
2  633322173      cat        y    432
3  634703802      dog        y    324
4  634927873    mouse        n    333
5  635812953 elephant        n   2223
6  636004739    goose        y 412346
7  636101211      rat        y   7456
8  636157799     mice        n   3456
9  636263106    kitty        n   2345
10 636752420   kitten        y   2345

我尝试的是:

df3 <- df1 %>%
  anti_join(df2, by = c("id","text"))

然后,我将列“文本”中的所有非零值更改为na di di

df3$text[df3$text != 0] <- NA

di di di di di di di di di di df2和df2之间以获取df4(匹配“ id”和“ id text”)

df4 <- df1 %>% 
    inner_join(df2, by = c("id","text"))

和innit_join df5与df3一起添加“音量”列

df5 <- df3 %>%
  inner_join(df2, by= c("id")) %>%
  select(id,text.x,response,volume)

将列名称'text.x'更改为'text'

colnames(df5)[colnames(df5)%in%c("text.x")] <- c("text")

,最后绑定DF4和DF5以获取所需的输出:

df6 <- rbind(df5,df4)

id     text response volume
1  632592651     <NA>        y   1234
2  633322173      cat        y    432
3  634703802      dog        y    324
4  634927873    mouse        n    333
5  635812953 elephant        n   2223
6  636004739    goose        y 412346
7  636101211      rat        y   7456
8  636157799     mice        n   3456
9  636263106    kitty        n   2345
10 636752420   kitten        y   2345

如果我有很多列,它会变得非常困难。有人可以建议如何简化此方法吗?事先感谢。

Below are my two dataframes, df1 and df2

df1 <- data.frame(id=c("632592651","633322173","634703802","634927873","635812953","636004739","636101211","636157799","636263106","636752420"),text=c("asdf","cat","dog","mouse","elephant","goose","rat","mice","kitty","kitten"),response=c("y","y","y","n","n","y","y","n","n","y"))

id     text response
1  632592651     asdf        y
2  633322173      cat        y
3  634703802      dog        y
4  634927873    mouse        n
5  635812953 elephant        n
6  636004739    goose        y
7  636101211      rat        y
8  636157799     mice        n
9  636263106    kitty        n
10 636752420   kitten        y

df2 <- data.frame(id=c("632592651","633322173","634703802","634927873","635812953","636004739","636101211","636157799","636263106","636752420","636809222","2004722036","2004894388","2005045755","2005535472","2005630542","2005788781","2005809679","2005838317","2005866692"),
                  text=c("asdf_xyz","cat","dog","mouse","elephant","goose","rat","mice","kitty","kitten","tiger_xyz","lion","leopard","ostrich","kangaroo","platypus","fish","reptile","mammals","amphibians_xyz"),
                  volume=c("1234","432","324","333","2223","412346","7456","3456","2345","2345","6","345","23","2","4778","234","8675","3459","8","9"))

 id           text volume
1   632592651       asdf_xyz   1234
2   633322173            cat    432
3   634703802            dog    324
4   634927873          mouse    333
5   635812953       elephant   2223
6   636004739          goose 412346
7   636101211            rat   7456
8   636157799           mice   3456
9   636263106          kitty   2345
10  636752420         kitten   2345
11  636809222      tiger_xyz      6
12 2004722036           lion    345
13 2004894388        leopard     23
14 2005045755        ostrich      2
15 2005535472       kangaroo   4778
16 2005630542       platypus    234
17 2005788781           fish   8675
18 2005809679        reptile   3459
19 2005838317        mammals      8
20 2005866692 amphibians_xyz      9

I would like to create a dataframe, df3, that return row values that match column 'id' and 'text' in both df1 and df2 but return NA to the mismatch in column 'text'

Desired output of df3:

 id   text response volume
1  632592651       NA        y   1234
2  633322173      cat        y    432
3  634703802      dog        y    324
4  634927873    mouse        n    333
5  635812953 elephant        n   2223
6  636004739    goose        y 412346
7  636101211      rat        y   7456
8  636157799     mice        n   3456
9  636263106    kitty        n   2345
10 636752420   kitten        y   2345

What I have tried:

df3 <- df1 %>%
  anti_join(df2, by = c("id","text"))

I then change all non zero value in column 'text' to NA

df3$text[df3$text != 0] <- NA

Did inner_join between df1 and df2 to get df4 (match both "id" and "text")

df4 <- df1 %>% 
    inner_join(df2, by = c("id","text"))

And inner_join df5 with df3 to add "volume" column

df5 <- df3 %>%
  inner_join(df2, by= c("id")) %>%
  select(id,text.x,response,volume)

Change column name 'text.x' to 'text'

colnames(df5)[colnames(df5)%in%c("text.x")] <- c("text")

And finally binding the df4 and df5 to get the desired output:

df6 <- rbind(df5,df4)

id     text response volume
1  632592651     <NA>        y   1234
2  633322173      cat        y    432
3  634703802      dog        y    324
4  634927873    mouse        n    333
5  635812953 elephant        n   2223
6  636004739    goose        y 412346
7  636101211      rat        y   7456
8  636157799     mice        n   3456
9  636263106    kitty        n   2345
10 636752420   kitten        y   2345

It's getting extremely difficult to deal with if I have many columns. Can someone advise how to simplify this method? Thanking in advance.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

逆蝶 2025-02-17 22:21:56

使用dplyr,仅通过“ id”加入,您可以这样简化它:

library(dplyr)

inner_join(x = df1, 
           y = df2, 
           by = "id") %>%
  mutate_if(is.factor, as.character) %>%
  mutate(text = ifelse(test = text.x != text.y, 
                       yes = NA, 
                       no = text.x)) %>%
  select(id, text, response, volume)

#>           id     text response volume
#> 1  632592651     <NA>        y   1234
#> 2  633322173      cat        y    432
#> 3  634703802      dog        y    324
#> 4  634927873    mouse        n    333
#> 5  635812953 elephant        n   2223
#> 6  636004739    goose        y 412346
#> 7  636101211      rat        y   7456
#> 8  636157799     mice        n   3456
#> 9  636263106    kitty        n   2345
#> 10 636752420   kitten        y   2345

Using dplyr and only joining by "id" you can simplify it like this:

library(dplyr)

inner_join(x = df1, 
           y = df2, 
           by = "id") %>%
  mutate_if(is.factor, as.character) %>%
  mutate(text = ifelse(test = text.x != text.y, 
                       yes = NA, 
                       no = text.x)) %>%
  select(id, text, response, volume)

#>           id     text response volume
#> 1  632592651     <NA>        y   1234
#> 2  633322173      cat        y    432
#> 3  634703802      dog        y    324
#> 4  634927873    mouse        n    333
#> 5  635812953 elephant        n   2223
#> 6  636004739    goose        y 412346
#> 7  636101211      rat        y   7456
#> 8  636157799     mice        n   3456
#> 9  636263106    kitty        n   2345
#> 10 636752420   kitten        y   2345
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文