如何使用 gsub() 精确替换字符串
我有一个语料库: txt =“微电子图案内的图案层。” 我想用“形式”完全替换术语“模式”,我尝试编写代码:
txt_replaced = gsub("pattern","form",txt)
但是,txt_replaced 中的响应语料库是: “微电子形式内的形成层。”
正如您所看到的,术语“patterned”被错误地替换为“formed”,因为“patterned”中的部分特征与“pattern”相匹配。
我想询问是否可以使用 gsub() 精确替换字符串? 也就是说,只有完全匹配的术语才应该被替换。
我渴望得到如下回应: “微电子形式内的图案层。”
非常感谢!
I have a corpus:
txt = "a patterned layer within a microelectronic pattern."
I would like to replace the term "pattern" exactly by "form", I try to write a code:
txt_replaced = gsub("pattern","form",txt)
However, the responsed corpus in txt_replaced is:
"a formed layer within a microelectronic form."
As you can see, the term "patterned" is wrongly replaced by "formed" because parts of characteristics in "patterned" matched to "pattern".
I would like to query that if I can replace the string exactly using gsub()?
That is, only the term with exactly match should be replaced.
I thirst for a responsed as below:
"a patterned layer within a microelectronic form."
Many thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
正如@koshke 指出的,(我)之前已经回答过一个非常类似的问题。 ...但是那是
grep
而这是gsub
,所以我会再次回答它:“\<”是单词开头的转义序列,“>”是结束。在 R 字符串中,您需要将反斜杠加倍,因此:
或者,您可以使用
\b
而不是\<
和\>
。\b
匹配单词边界,因此两端都可以使用>另请注意,如果您只想替换 1 个匹配项,则应使用
sub
而不是gsub
。As @koshke noted, a very similar question has been answered before (by me). ...But that was
grep
and this isgsub
, so I'll answer it again:"\<" is an escape sequence for the beginning of a word, and ">" is the end. In R strings you need to double the backslashes, so:
Or, you could use
\b
instead of\<
and\>
.\b
matches a word boundary so it can be used at both ends>Also note that if you want to replace only ONE occurrence, you should use
sub
instead ofgsub
.