删除 R 中除撇号之外的所有标点符号

发布于 2024-12-24 00:59:46 字数 477 浏览 6 评论 0原文

我想使用 R 的 gsub 删除文本中除撇号之外的所有标点符号。我对正则表达式相当陌生，但正在学习。

示例：

x <- "I like %$@to*&, chew;: gum, but don't like|}{[] bubble@#^)( gum!?"
gsub("[[:punct:]]", "", as.character(x))

当前输出（don't中没有撇号）

[1] "I like to chew gum but dont like bubble gum"

期望输出（我希望don't中的撇号保留）

[1] "I like to chew gum but don't like bubble gum"

原文

I'd like to use R's gsub to remove all punctuation from a text except for apostrophes. I'm fairly new to regex but am learning.

Example:

x <- "I like %$@to*&, chew;: gum, but don't like|}{[] bubble@#^)( gum!?"
gsub("[[:punct:]]", "", as.character(x))

Current Output (no apostrophe in don't)

[1] "I like to chew gum but dont like bubble gum"

Desired Output (I desire the apostrophe in don't to stay)

[1] "I like to chew gum but don't like bubble gum"

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

不知所踪 2024-12-31 00:59:46

x <- "I like %$@to*&, chew;: gum, but don't like|}{[] bubble@#^)( gum!?"
gsub("[^[:alnum:][:space:]']", "", x)

[1] "I like to chew gum but don't like bubble gum"

上面的正则表达式更加直接。它将所有非字母数字符号、空格或撇号（插入符号！）的内容替换为空字符串。

x <- "I like %$@to*&, chew;: gum, but don't like|}{[] bubble@#^)( gum!?"
gsub("[^[:alnum:][:space:]']", "", x)

[1] "I like to chew gum but don't like bubble gum"

The above regex is much more straight forward. It replaces everything that's not alphanumeric signs, space or apostrophe (caret symbol!) with an empty string.

回复收藏 0 原文

菩提树下叶撕阳。 2024-12-31 00:59:46

您可以使用双重否定从 POSIX 类 punct 中排除撇号：

[^'[:^punct:]]

Code:

x <- "I like %$@to*&, chew;: gum, but don't like|}{[] bubble@#^)( gum!?"
gsub("[^'[:^punct:]]", "", x, perl=T)

#[1] "I like to chew gum but don't like bubble gum"

ideone 演示

You can exclude apostrophes from the POSIX class punct using a double negative:

[^'[:^punct:]]

Code:

x <- "I like %$@to*&, chew;: gum, but don't like|}{[] bubble@#^)( gum!?"
gsub("[^'[:^punct:]]", "", x, perl=T)

#[1] "I like to chew gum but don't like bubble gum"

ideone demo

回复收藏 0 原文

挽你眉间 2024-12-31 00:59:46

这是一个例子：

>  gsub("(.*?)($|'|[^[:punct:]]+?)(.*?)", "\\2", x)
[1] "I like to chew gum but don't like bubble gum"

Here is an example:

>  gsub("(.*?)($|'|[^[:punct:]]+?)(.*?)", "\\2", x)
[1] "I like to chew gum but don't like bubble gum"

回复收藏 0 原文

沐歌 2024-12-31 00:59:46

主要是为了多样性，这里有一个使用同名出色包中的 gsubfn() 的解决方案。在此应用程序中，我只是喜欢它所允许的解决方案的表达能力：（

library(gsubfn)
gsubfn(pattern = "[[:punct:]]", engine = "R",
       replacement = function(x) ifelse(x == "'", "'", ""), 
       x)
[1] "I like to chew gum but don't like bubble gum"

此处需要参数 engine = "R"，否则将使用默认的 tcl 引擎。其匹配正则表达式的规则略有不同：例如，如果使用它来处理上面的字符串，则需要设置 pattern = "[[:punct:]$|^]"，感谢 G. Grothendieck。指出这个细节。）

Mostly for variety, here's a solution using gsubfn() from the terrific package of the same name. In this application, I just like how nicely expressive the solution it allows is:

library(gsubfn)
gsubfn(pattern = "[[:punct:]]", engine = "R",
       replacement = function(x) ifelse(x == "'", "'", ""), 
       x)
[1] "I like to chew gum but don't like bubble gum"

(The argument engine = "R" is needed here as otherwise the default tcl engine will be used. Its rules for matching regular expressions are slightly different: if it were used to process the string above, for instance, one would need to instead set pattern = "[[:punct:]$|^]". Thanks to G. Grothendieck for pointing out that detail.)

回复收藏 0 原文

~没有更多了~