替换包含涉及特殊特征的不均匀图案的子字符串

发布于 2025-02-10 21:13:21 字数 1109 浏览 0 评论 0原文

我有一个数据,其中一个变量具有非均匀模式/格式,我需要在r中编写一个代码,该代码可以在变量中删除遵循特定模式的变量的该部分。

有关于替换模式的链接在r 中,替换模式在r 中分开的模式,and 删除字符串的一部分,但他们尚未讨论与我的问题数据。

这就是变量(c)的样子,下面是我尝试的选项以及它们的结果。

c <-  c("1998/123; 2001","181;2002/12","212")
c1 <- gsub("[0-9]/[0-9]", "", c) # returns 19923;2001, 181;2002, 212
c2 <- gsub("[0-9]/*", "", c) # returns ";",  " ", ";", ""
c3 <- gsub("[0-9][0-9][0-9][0-9]/", "", c) # returns 123;2001, 181;12, 212 
c4 <- gsub("*[0-9]/[0-9]*", "", c) # returns 199, 200, 212 
c5 <- gsub(" */* ", "", c) # no change
c6 <- str_replace_all(c,"/","") # returns 1998123, 200212, 212
c7 <- grep(fixed("/"), c, invert=TRUE, value = TRUE) # returns 212

a)向前斜线后可以有3-8位数字。但是,在向前斜线之前只能有4位数字。

b)每个子弦被半串分隔符。

c)我想替换那些用空白包含前向斜线的子字符串。因此,我的结果应该是C(“; 2001”,“ 181;”,“ 212”)。

请让我知道我在哪里犯错。任何建议都非常欢迎。谢谢。

I have a data wherein one of the variable has a non-uniform pattern/format and I need to write a code in R which can remove that part of the string in the variable which follows a specific pattern.

There are links on replacement of patterns such as Extract a string between patterns/delimiters in R, Replace patterns separated by delimiter in R, and Remove part of a string but they haven't discussed the issue related to my data.

This is how the variable (c) looks like and below are the options I tried along with their results.

c <-  c("1998/123; 2001","181;2002/12","212")
c1 <- gsub("[0-9]/[0-9]", "", c) # returns 19923;2001, 181;2002, 212
c2 <- gsub("[0-9]/*", "", c) # returns ";",  " ", ";", ""
c3 <- gsub("[0-9][0-9][0-9][0-9]/", "", c) # returns 123;2001, 181;12, 212 
c4 <- gsub("*[0-9]/[0-9]*", "", c) # returns 199, 200, 212 
c5 <- gsub(" */* ", "", c) # no change
c6 <- str_replace_all(c,"/","") # returns 1998123, 200212, 212
c7 <- grep(fixed("/"), c, invert=TRUE, value = TRUE) # returns 212

a) There can be 3-8 digits after the forward slash. But there can only be 4 digits before the forward slash.

b) Each sub-string is separated by a semicolon delimiter.

c) I want to replace those substrings that contain the forward slash with blank. So, my result should be c(";2001", "181;" ,"212").

Kindly let me know where am I making the mistake. Any suggestions are very much welcome. Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

若水般的淡然安静女子 2025-02-17 21:13:21

由于前向斜线之前和之后的数字具有多个数字,您可以在第一种方法中使用+(1或更多)或*(0及以上),以删除所有他们:

c <-  c("1998/123; 2001","181;2002/12","212")

gsub("\\d+\\/\\d+", "", c)
#> [1] "; 2001" "181;"   "212"

As the numbers before and after the forward slash have multiple digits you could use + (1 or more) or * (0 and more) in your first approach to remove all of them:

c <-  c("1998/123; 2001","181;2002/12","212")

gsub("\\d+\\/\\d+", "", c)
#> [1] "; 2001" "181;"   "212"
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文