模糊匹配两个数据帧

发布于 2025-01-24 00:59:33 字数 675 浏览 0 评论 0原文

我想合并两个数据帧DF1和DF2。

df1<-tibble(x=c("FIDELITY FREEDOM 2015 FUND", "VANGUARD WELLESLEY INCOME FUND"),y=c(1,2))
df2<-tibble(x=c("FIDELITY ABERDEEN STREET TRUST: FIDELITY FREEDOM 2015 FUND", "VANGUARD/WELLESLEY INCOME FUND, INC: VANGUARD WELLESLEY INCOME FUND; INVESTOR SHARES"),z=c(2020,2021))

我想基于x合并DF1和DF2。目前,我尝试使用模糊匹配并使用

fuzzy_join(df1,df2,match_fun = function(x,y) grepl(x, y))

它为我提供了如下输出,

In grepl(x, y) :
  argument 'pattern' has length > 1 and only the first element will be used.

您是否有合并DF1和DF2的想法?我正在考虑如何编写match_fun函数,但我不确定如何进步。太感谢了!

I want to merge two data frames df1 and df2.

df1<-tibble(x=c("FIDELITY FREEDOM 2015 FUND", "VANGUARD WELLESLEY INCOME FUND"),y=c(1,2))
df2<-tibble(x=c("FIDELITY ABERDEEN STREET TRUST: FIDELITY FREEDOM 2015 FUND", "VANGUARD/WELLESLEY INCOME FUND, INC: VANGUARD WELLESLEY INCOME FUND; INVESTOR SHARES"),z=c(2020,2021))

I want to merge df1 and df2 based on x. Currently, I try fuzzy matching and use

fuzzy_join(df1,df2,match_fun = function(x,y) grepl(x, y))

It gives me the output as follows,

In grepl(x, y) :
  argument 'pattern' has length > 1 and only the first element will be used.

Do you have any ideas for merging df1 and df2? I am thinking about how to write the match_fun function but I am not sure how to progress. Thank you so much!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

倾城°AllureLove 2025-01-31 00:59:33

我们可以使用fuzzy_inner_joinREGEX_INNER_JOIN来自fuzzyjoin软件包。

library(fuzzyjoin)
library(stringr)
df2 %>% fuzzy_inner_join(df1, by = "x", match_fun = str_detect)
  x.x                                                                                      z x.y                                y
  <chr>                                                                                <dbl> <chr>                          <dbl>
1 FIDELITY ABERDEEN STREET TRUST: FIDELITY FREEDOM 2015 FUND                            2020 FIDELITY FREEDOM 2015 FUND         1
2 VANGUARD/WELLESLEY INCOME FUND, INC: VANGUARD WELLESLEY INCOME FUND; INVESTOR SHARES  2021 VANGUARD WELLESLEY INCOME FUND     2

或:

library(fuzzyjoin)
df2 %>% regex_inner_join(df1, by = "x")

We could either use fuzzy_inner_join or regex_inner_join from fuzzyjoin package.

library(fuzzyjoin)
library(stringr)
df2 %>% fuzzy_inner_join(df1, by = "x", match_fun = str_detect)
  x.x                                                                                      z x.y                                y
  <chr>                                                                                <dbl> <chr>                          <dbl>
1 FIDELITY ABERDEEN STREET TRUST: FIDELITY FREEDOM 2015 FUND                            2020 FIDELITY FREEDOM 2015 FUND         1
2 VANGUARD/WELLESLEY INCOME FUND, INC: VANGUARD WELLESLEY INCOME FUND; INVESTOR SHARES  2021 VANGUARD WELLESLEY INCOME FUND     2

or:

library(fuzzyjoin)
df2 %>% regex_inner_join(df1, by = "x")
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文