使用 lapply 和 gsub 来使用另一个数据帧作为“字典”来替换数据帧中的单词;
我有一个名为 data 的数据框,我想在其中替换特定列 A 和 A 中的某些单词。 B、
我有第二个名为 dict 的数据框,它扮演字典/哈希的角色,其中包含用于替换的单词和值。
我认为这可以用 purrr 的 map() 来完成,但我想使用 apply。这是一个包,我不想加载另一个包。
下面的代码不起作用,但它给了你这个想法。我被困住了。
columns <- c("A", "B" )
data[columns] <- lapply(data[columns], function(x){x}) %>% lapply(dict, function(y){
gsub(pattern = y[,2], replacement = y[,1], x)})
这适用于更改一个单词...但我无法传递字典中包含的更改列表。
data[columns] <- lapply(data[columns], gsub, pattern = "FLT1", replacement = "flt1")
I have a dataframe called data where I want to replace some word in specific columns A & B.
I have a second dataframe called dict that is playing the role of dictionnary/hash containing the words and the values to use for replacement.
I think it could be done with purrr’s map() but I want to use apply. It's for a package and I don't want to have to load another package.
The following code is not working but it's give you the idea. I'm stuck.
columns <- c("A", "B" )
data[columns] <- lapply(data[columns], function(x){x}) %>% lapply(dict, function(y){
gsub(pattern = y[,2], replacement = y[,1], x)})
This is working for one word to change...but I'm not able to pass the list of changes conainted in the dictionnary.
data[columns] <- lapply(data[columns], gsub, pattern = "FLT1", replacement = "flt1")
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
@Gregor_Thomas 是对的,你需要一个
for
循环来产生递归效果,否则你只是一次替换一个值。或者,如果您的
dict
数据太长,您可以使用paste
作为代码生成器来生成一系列您需要的gsub
:生成“A”列的所有
gsub
行:然后评估代码并将其包装在各个列的 lapply 中:
它很丑陋,但可以很好地避免长循环。
编辑:为了在
df
和dict
之间精确匹配,也许你应该使用==
的布尔选择而不是gsub()
。(我在这里不使用
match()
因为它只选择第一个匹配的@Gregor_Thomas is right, you need a
for
loop to have a recursive effect, otherwise you just replace one value at the time.Or, if your
dict
data is too long you can generate a succession of all thegsub
you need using apaste
as a code generator :It generates all the
gsub
lines for the "A" column :Then you evaluate the code and wrap it in a lapply for the various columns :
It's ugly but it works fine to avoid long loops.
Edit : for a exact matching between
df
anddict
maybe you should use a boolean selection with==
instead ofgsub()
.(I don't use
match()
here because it selects only the first matching