如何返回匹配列的行值' id'在DF1和DF2中,但不列文本'并将NA返回到列中的不匹配'?
以下是我的两个数据框,DF1和DF2
df1 <- data.frame(id=c("632592651","633322173","634703802","634927873","635812953","636004739","636101211","636157799","636263106","636752420"),text=c("asdf","cat","dog","mouse","elephant","goose","rat","mice","kitty","kitten"),response=c("y","y","y","n","n","y","y","n","n","y"))
id text response
1 632592651 asdf y
2 633322173 cat y
3 634703802 dog y
4 634927873 mouse n
5 635812953 elephant n
6 636004739 goose y
7 636101211 rat y
8 636157799 mice n
9 636263106 kitty n
10 636752420 kitten y
df2 <- data.frame(id=c("632592651","633322173","634703802","634927873","635812953","636004739","636101211","636157799","636263106","636752420","636809222","2004722036","2004894388","2005045755","2005535472","2005630542","2005788781","2005809679","2005838317","2005866692"),
text=c("asdf_xyz","cat","dog","mouse","elephant","goose","rat","mice","kitty","kitten","tiger_xyz","lion","leopard","ostrich","kangaroo","platypus","fish","reptile","mammals","amphibians_xyz"),
volume=c("1234","432","324","333","2223","412346","7456","3456","2345","2345","6","345","23","2","4778","234","8675","3459","8","9"))
id text volume
1 632592651 asdf_xyz 1234
2 633322173 cat 432
3 634703802 dog 324
4 634927873 mouse 333
5 635812953 elephant 2223
6 636004739 goose 412346
7 636101211 rat 7456
8 636157799 mice 3456
9 636263106 kitty 2345
10 636752420 kitten 2345
11 636809222 tiger_xyz 6
12 2004722036 lion 345
13 2004894388 leopard 23
14 2005045755 ostrich 2
15 2005535472 kangaroo 4778
16 2005630542 platypus 234
17 2005788781 fish 8675
18 2005809679 reptile 3459
19 2005838317 mammals 8
20 2005866692 amphibians_xyz 9
,我想创建一个dataframe DF3,返回DF1和DF2中匹配列“ ID”和“ text”的行值,但返回NA中的Na返回到
所需的 列“文本”中的不匹配中DF3的输出:
id text response volume
1 632592651 NA y 1234
2 633322173 cat y 432
3 634703802 dog y 324
4 634927873 mouse n 333
5 635812953 elephant n 2223
6 636004739 goose y 412346
7 636101211 rat y 7456
8 636157799 mice n 3456
9 636263106 kitty n 2345
10 636752420 kitten y 2345
我尝试的是:
df3 <- df1 %>%
anti_join(df2, by = c("id","text"))
然后,我将列“文本”中的所有非零值更改为na di di
df3$text[df3$text != 0] <- NA
di di di di di di di di di di df2和df2之间以获取df4(匹配“ id”和“ id text”)
df4 <- df1 %>%
inner_join(df2, by = c("id","text"))
和innit_join df5与df3一起添加“音量”列
df5 <- df3 %>%
inner_join(df2, by= c("id")) %>%
select(id,text.x,response,volume)
将列名称'text.x'更改为'text'
colnames(df5)[colnames(df5)%in%c("text.x")] <- c("text")
,最后绑定DF4和DF5以获取所需的输出:
df6 <- rbind(df5,df4)
id text response volume
1 632592651 <NA> y 1234
2 633322173 cat y 432
3 634703802 dog y 324
4 634927873 mouse n 333
5 635812953 elephant n 2223
6 636004739 goose y 412346
7 636101211 rat y 7456
8 636157799 mice n 3456
9 636263106 kitty n 2345
10 636752420 kitten y 2345
如果我有很多列,它会变得非常困难。有人可以建议如何简化此方法吗?事先感谢。
Below are my two dataframes, df1 and df2
df1 <- data.frame(id=c("632592651","633322173","634703802","634927873","635812953","636004739","636101211","636157799","636263106","636752420"),text=c("asdf","cat","dog","mouse","elephant","goose","rat","mice","kitty","kitten"),response=c("y","y","y","n","n","y","y","n","n","y"))
id text response
1 632592651 asdf y
2 633322173 cat y
3 634703802 dog y
4 634927873 mouse n
5 635812953 elephant n
6 636004739 goose y
7 636101211 rat y
8 636157799 mice n
9 636263106 kitty n
10 636752420 kitten y
df2 <- data.frame(id=c("632592651","633322173","634703802","634927873","635812953","636004739","636101211","636157799","636263106","636752420","636809222","2004722036","2004894388","2005045755","2005535472","2005630542","2005788781","2005809679","2005838317","2005866692"),
text=c("asdf_xyz","cat","dog","mouse","elephant","goose","rat","mice","kitty","kitten","tiger_xyz","lion","leopard","ostrich","kangaroo","platypus","fish","reptile","mammals","amphibians_xyz"),
volume=c("1234","432","324","333","2223","412346","7456","3456","2345","2345","6","345","23","2","4778","234","8675","3459","8","9"))
id text volume
1 632592651 asdf_xyz 1234
2 633322173 cat 432
3 634703802 dog 324
4 634927873 mouse 333
5 635812953 elephant 2223
6 636004739 goose 412346
7 636101211 rat 7456
8 636157799 mice 3456
9 636263106 kitty 2345
10 636752420 kitten 2345
11 636809222 tiger_xyz 6
12 2004722036 lion 345
13 2004894388 leopard 23
14 2005045755 ostrich 2
15 2005535472 kangaroo 4778
16 2005630542 platypus 234
17 2005788781 fish 8675
18 2005809679 reptile 3459
19 2005838317 mammals 8
20 2005866692 amphibians_xyz 9
I would like to create a dataframe, df3, that return row values that match column 'id' and 'text' in both df1 and df2 but return NA to the mismatch in column 'text'
Desired output of df3:
id text response volume
1 632592651 NA y 1234
2 633322173 cat y 432
3 634703802 dog y 324
4 634927873 mouse n 333
5 635812953 elephant n 2223
6 636004739 goose y 412346
7 636101211 rat y 7456
8 636157799 mice n 3456
9 636263106 kitty n 2345
10 636752420 kitten y 2345
What I have tried:
df3 <- df1 %>%
anti_join(df2, by = c("id","text"))
I then change all non zero value in column 'text' to NA
df3$text[df3$text != 0] <- NA
Did inner_join between df1 and df2 to get df4 (match both "id" and "text")
df4 <- df1 %>%
inner_join(df2, by = c("id","text"))
And inner_join df5 with df3 to add "volume" column
df5 <- df3 %>%
inner_join(df2, by= c("id")) %>%
select(id,text.x,response,volume)
Change column name 'text.x' to 'text'
colnames(df5)[colnames(df5)%in%c("text.x")] <- c("text")
And finally binding the df4 and df5 to get the desired output:
df6 <- rbind(df5,df4)
id text response volume
1 632592651 <NA> y 1234
2 633322173 cat y 432
3 634703802 dog y 324
4 634927873 mouse n 333
5 635812953 elephant n 2223
6 636004739 goose y 412346
7 636101211 rat y 7456
8 636157799 mice n 3456
9 636263106 kitty n 2345
10 636752420 kitten y 2345
It's getting extremely difficult to deal with if I have many columns. Can someone advise how to simplify this method? Thanking in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
使用
dplyr
,仅通过“ id”加入,您可以这样简化它:
Using
dplyr
and only joiningby
"id" you can simplify it like this: