删除 R 中除撇号之外的所有标点符号
我想使用 R 的 gsub 删除文本中除撇号之外的所有标点符号。我对正则表达式相当陌生,但正在学习。
示例:
x <- "I like %$@to*&, chew;: gum, but don't like|}{[] bubble@#^)( gum!?"
gsub("[[:punct:]]", "", as.character(x))
当前输出(don't中没有撇号)
[1] "I like to chew gum but dont like bubble gum"
期望输出(我希望don't中的撇号保留)
[1] "I like to chew gum but don't like bubble gum"
I'd like to use R's gsub to remove all punctuation from a text except for apostrophes. I'm fairly new to regex but am learning.
Example:
x <- "I like %$@to*&, chew;: gum, but don't like|}{[] bubble@#^)( gum!?"
gsub("[[:punct:]]", "", as.character(x))
Current Output (no apostrophe in don't)
[1] "I like to chew gum but dont like bubble gum"
Desired Output (I desire the apostrophe in don't to stay)
[1] "I like to chew gum but don't like bubble gum"
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
上面的正则表达式更加直接。它将所有非字母数字符号、空格或撇号(插入符号!)的内容替换为空字符串。
The above regex is much more straight forward. It replaces everything that's not alphanumeric signs, space or apostrophe (caret symbol!) with an empty string.
您可以使用双重否定从 POSIX 类
punct
中排除撇号:Code:
ideone 演示
You can exclude apostrophes from the POSIX class
punct
using a double negative:Code:
ideone demo
这是一个例子:
Here is an example:
主要是为了多样性,这里有一个使用同名出色包中的
gsubfn()
的解决方案。在此应用程序中,我只是喜欢它所允许的解决方案的表达能力:(此处需要参数
engine = "R"
,否则将使用默认的 tcl 引擎。其匹配正则表达式的规则略有不同:例如,如果使用它来处理上面的字符串,则需要设置pattern = "[[:punct:]$|^]"
,感谢 G. Grothendieck。指出这个细节。)Mostly for variety, here's a solution using
gsubfn()
from the terrific package of the same name. In this application, I just like how nicely expressive the solution it allows is:(The argument
engine = "R"
is needed here as otherwise the default tcl engine will be used. Its rules for matching regular expressions are slightly different: if it were used to process the string above, for instance, one would need to instead setpattern = "[[:punct:]$|^]"
. Thanks to G. Grothendieck for pointing out that detail.)