R中的高级查找:如何查看字符串并添加来自数据框架的值?

发布于 2025-01-20 07:09:59 字数 1115 浏览 2 评论 0原文

Excel vlookup 函数和 R join 函数都没有帮助。我试图从一个数据帧中查找特定字符串,并根据不同数据帧的匹配添加新列。但据我所知, _join 函数并不能很好地解决我的特定问题。我在下面展示了两个数据帧和我的代码:

**id**        **address**
3811          bb
4803          dd
4820          dd
852           aa
4031          dd

我想查看这个 address 变量并与下面另一个数据帧中的 local 变量进行匹配。然后我想添加district列中的值。

**local**             **district**
aa                    AA
bb                    BB
cc                    CC
dd                    DD

我运行这段代码来完成任务。我想,当我在没有 for 循环的情况下运行时,它表现良好。但是,使用 for 循环会产生错误。

distr <- data.frame(1:7000)

for (word in df2$local) {
    ind = stringi::stri_detect_fixed(train$address, word) %>% which(.==T)
    ind2 = stringi::stri_detect_fixed(df2$local, word) %>% which(.==T)
    distr[ind, 2] <- df2[ind2, 3]
  }

代码是这样设计的,因此我可以稍后将数据帧 distr 列添加到 train 数据帧。我在哪里犯了特定错误才能正确运行代码?有精通弦乐的人吗?

PS 顺便说一句,我选择了 stri_detect_fixed 函数,因为正则表达式不能适用于这里的每个值。

Neither Excel vlookup function nor R join functions do help. I am attempting to look up for a specific string from one dataframe and add new columns based on the match from different dataframe. But as far as I see the _join functions don't do the justice for my particular problem. I present two dataframes and my code here below:

**id**        **address**
3811          bb
4803          dd
4820          dd
852           aa
4031          dd

I want to look through this address variable and match from local variable in another dataframe below. Then I want to add values from a column district.

**local**             **district**
aa                    AA
bb                    BB
cc                    CC
dd                    DD

I ran this code to complete the task. It performs well when I ran without for loop, I guess. However, with for loop it produces an error.

distr <- data.frame(1:7000)

for (word in df2$local) {
    ind = stringi::stri_detect_fixed(train$address, word) %>% which(.==T)
    ind2 = stringi::stri_detect_fixed(df2$local, word) %>% which(.==T)
    distr[ind, 2] <- df2[ind2, 3]
  }

The code is designed this way so I could add the column of dataframe distr to train dataframe later on. Where am I making specific errors to run code this properly? Anyone with string expertise?

P.S. By the way, I chose stri_detect_fixed function because regex expressions couldn't work for each values here.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

梦幻之岛 2025-01-27 07:09:59

正如 I_O 所建议的,这似乎与 fuzzyjoin::fuzzy_join() 配合得很好

library(fuzzyjoin)
fuzzy_join(d1, d2, match_fun = stringi::stri_detect_fixed,
           by = c("address" = "local"))

    id              address      local     district
1 3811 Yntymak,Жетиген/Орто    Yntymak    leninskyi
2 4803           JD station JD station pervomayskyi
3 4820 JD station, Panfilov JD station pervomayskyi
4  852              Ak-Bata    Ak-Bata sverdlovskyi
5 4031           JD station JD station pervomayskyi

d1 <- read.table(header = TRUE,
                 sep = ";",
                 text = "
id;address
3811;Yntymak,Жетиген/Орто
4803;JD station
4820;JD station, Panfilov
852;Ak-Bata
4031;JD station
")

d2 <- read.table(header = TRUE,
                 sep = ";",
                 text = "
local;district
Ak-Bata;sverdlovskyi
Yntymak;leninskyi
Zhilgorodok Sovmina;oktyabrskyi
JD station;pervomayskyi
")

As I_O sugggests, this seems to work fine with fuzzyjoin::fuzzy_join():

library(fuzzyjoin)
fuzzy_join(d1, d2, match_fun = stringi::stri_detect_fixed,
           by = c("address" = "local"))

gives

    id              address      local     district
1 3811 Yntymak,Жетиген/Орто    Yntymak    leninskyi
2 4803           JD station JD station pervomayskyi
3 4820 JD station, Panfilov JD station pervomayskyi
4  852              Ak-Bata    Ak-Bata sverdlovskyi
5 4031           JD station JD station pervomayskyi

d1 <- read.table(header = TRUE,
                 sep = ";",
                 text = "
id;address
3811;Yntymak,Жетиген/Орто
4803;JD station
4820;JD station, Panfilov
852;Ak-Bata
4031;JD station
")

d2 <- read.table(header = TRUE,
                 sep = ";",
                 text = "
local;district
Ak-Bata;sverdlovskyi
Yntymak;leninskyi
Zhilgorodok Sovmina;oktyabrskyi
JD station;pervomayskyi
")
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文