循环:在r中函数时,如何循环case_?
这是代码,我试图通过检测单词并匹配单词来创建变量。在这里,我使用 dplyr
软件包及其功能突变
与 case_when
结合使用。问题是我正在手动添加每个值。如何通过应用一些循环函数匹配两者来自动化它?
city <- LETTERS #26 cities
district <- letters[10:20] #11 districts
streets <- paste0(district, district)
streets <- streets[-c(5:26)] #4 streets
df <- data.frame(x = c(1:5),
address = c("A, b, cc,", "B, dd", "a, dd", "C", "D, a, cc"))
library(dplyr)
library(stringi)
df2 <- df %>%
mutate(districts = case_when(
stri_detect_fixed(address, "b") ~ "b", #address[1]
#address[2]
stri_detect_fixed(address, "a") ~ "a", #address[3]
#address[4]
stri_detect_fixed(address, "cc") ~ "cc" #address[5]
))
代码通过地址扫描
district
向量的值。我很想为 city
和 street
变量做同样的事情。因此,我使用了代码的修改版本另一个问题在堆栈溢出中。它会产生错误。
for (j in town_village2) {
trn_house3[,93] <- case_when(
stri_detect_fixed(trn_house3[1:6469, 4], j) ~ j)
}
我试图产生这个结果:
x address city district street
1 A, b, cc, A b cc
2 B, dd B NA dd
3 a, dd NA a dd
4 C C NA NA
5 D, a, cc D a cc
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果要添加循环,使用
case_when()
;如果您可以循环浏览它们,则不必将所有选项添加到其中。您可以使用循环解决:
请注意,您的示例代码不起作用;区域名称在您的示例数据集中为“ A”和“ B”,但是您可以生成“ j”通过“ t”的名称。我在上面的代码中解决了这一点。
如果城市,地区和/或街道重叠,这将导致错误。例如,如果一排在“ B”区,并且在街道“ CC”中,则stri_detect_fixed也将看到“ C”,并认为它在“ C”中。我提出了一种完全不同的方法来克服这一点:
替代方法
鉴于您的示例数据,首先将给定的地址划分为
,
,这是最有意义的。 ,然后寻找 Exact 与您的参考城市/地区/街道名称匹配。我们可以与Intersect()
一起查找这些确切的匹配。比较
df $ address
和新创建的address_elems
:我们可以找到匹配的
coities
,仅在adverion_elems
中仅是第一个向量与Intersect(Cities,advelly_elems [[1]])
中。因为我们可能获得多个匹配项,所以我们只采用第一个元素,
Intersect(cities,address_elems [[1]])[[1])[[1]]
。其应用于
adverry_elems
中的每个向量
要将 我们一起得到了:
If you are going to add a loop, it makes no sense to use
case_when()
; you don't have to add all options into it if you can loop over them.You can solve it with a for-loop:
Note that your example code didn't work; the district names are 'a' and 'b' in your example dataset, but you generate names 'j' through 't'. I fixed that in my code above.
And it will cause an error if names of cities, districts and/or streets overlap. For instance, if one row is in the district 'b', and in the street 'cc', stri_detect_fixed will also see the 'c' and think it is in 'c'. I propose a completely different method to overcome this:
Alternative method
Given your example data, it makes most sense to first split the given address by
,
, then look for exact matches with your reference city/district/street names. We can look for those exact matches withintersect()
.Compare
df$address
and the newly createdaddress_elems
:We could find matching
cities
for just the first vector inaddress_elems
in withintersect(cities, address_elems[[1]])
.Because we might get multiple matches, we only take the first element, with
intersect(cities, address_elems[[1]])[[1]]
.To apply this to every vector in
address_elems
, we can usesapply()
orlapply()
:PIAT
Putting it all together we get:
这将将元素分为向量:
由
This will separate the elements into vectors:
Created on 2022-04-14 by the reprex package (v2.0.0)
data.table
方法a
data.table
approach