如果另一列中的字符串包含具有不同标点符号和字体大小的单词,如何创建取 1 的新变量?
我有一个看起来像这样的列
col1
"business"
"BusinesS"
"education"
"some BUSINESS ."
"business of someone, that is cool"
" not the b word"
"busi ness"
"busines."
"businesses"
"something else"
我需要一种有效的方法将所有这些字符串数据转换为新值
col1 col2
NA 1
NA 1
"education" NA
NA 1
NA 1
" not the b word" NA
NA 1
NA 1
NA 1
"something else" NA
所以共同点是“业务”,但我不知道如何有效地使其整理所有空间、标点符号、小写/大写、其他单词等在一个创建新列的突变中。
I have a column that looks something like this
col1
"business"
"BusinesS"
"education"
"some BUSINESS ."
"business of someone, that is cool"
" not the b word"
"busi ness"
"busines."
"businesses"
"something else"
And I need an efficient way of getting all this string data into a new value
col1 col2
NA 1
NA 1
"education" NA
NA 1
NA 1
" not the b word" NA
NA 1
NA 1
NA 1
"something else" NA
So the common denominator is "busines", but I don't know how to efficiently make it sort out all the spaces, punctuation, lower/uppercases, other words etc. in one mutate that creates a new column.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果
str_detect
检测到任何形式的business
,并且NA
,我们可以使用ifelse
设置1
代码> 如果没有。请注意,(?i)
使匹配不区分大小写,并且\\s?
和s?
中的?
使前面的项目可选;因此\\s?
匹配可选空格,而s?
匹配可选文字s
We can use
ifelse
to set1
ifstr_detect
detects any form ofbusiness
, andNA
if it doesn't. Note that(?i)
makes the match case-insensitive and?
in\\s?
ands?
makes the preceding item optional; so\\s?
matches an optional space ands?
matches an optional literals
您可以使用
gsub
替换所有非单词字符,然后使用grepl
来检测busines
:另一种方法是使用
agrepl
code> 用于近似字符串匹配,其中1L
给出到给定模式的最大距离。如果您正在寻找
business
而不是busines
,agrep
也可以是一个解决方案:数据:
You can replace all non word characters using
gsub
and than usegrepl
to detectbusines
:Another way would be to use
agrepl
for Approximate String Matching, where here1L
gives the maximum distance to the given pattern.agrep
can also be a solution in case you are looking forbusiness
instead ofbusines
:Data: