替换包含涉及特殊特征的不均匀图案的子字符串
我有一个数据,其中一个变量具有非均匀模式/格式,我需要在r中编写一个代码,该代码可以在变量中删除遵循特定模式的变量的该部分。
有关于替换模式的链接在r 中,替换模式在r 中分开的模式,and 删除字符串的一部分,但他们尚未讨论与我的问题数据。
这就是变量(c)的样子,下面是我尝试的选项以及它们的结果。
c <- c("1998/123; 2001","181;2002/12","212")
c1 <- gsub("[0-9]/[0-9]", "", c) # returns 19923;2001, 181;2002, 212
c2 <- gsub("[0-9]/*", "", c) # returns ";", " ", ";", ""
c3 <- gsub("[0-9][0-9][0-9][0-9]/", "", c) # returns 123;2001, 181;12, 212
c4 <- gsub("*[0-9]/[0-9]*", "", c) # returns 199, 200, 212
c5 <- gsub(" */* ", "", c) # no change
c6 <- str_replace_all(c,"/","") # returns 1998123, 200212, 212
c7 <- grep(fixed("/"), c, invert=TRUE, value = TRUE) # returns 212
a)向前斜线后可以有3-8位数字。但是,在向前斜线之前只能有4位数字。
b)每个子弦被半串分隔符。
c)我想替换那些用空白包含前向斜线的子字符串。因此,我的结果应该是C(“; 2001”,“ 181;”,“ 212”)。
请让我知道我在哪里犯错。任何建议都非常欢迎。谢谢。
I have a data wherein one of the variable has a non-uniform pattern/format and I need to write a code in R which can remove that part of the string in the variable which follows a specific pattern.
There are links on replacement of patterns such as Extract a string between patterns/delimiters in R, Replace patterns separated by delimiter in R, and Remove part of a string but they haven't discussed the issue related to my data.
This is how the variable (c) looks like and below are the options I tried along with their results.
c <- c("1998/123; 2001","181;2002/12","212")
c1 <- gsub("[0-9]/[0-9]", "", c) # returns 19923;2001, 181;2002, 212
c2 <- gsub("[0-9]/*", "", c) # returns ";", " ", ";", ""
c3 <- gsub("[0-9][0-9][0-9][0-9]/", "", c) # returns 123;2001, 181;12, 212
c4 <- gsub("*[0-9]/[0-9]*", "", c) # returns 199, 200, 212
c5 <- gsub(" */* ", "", c) # no change
c6 <- str_replace_all(c,"/","") # returns 1998123, 200212, 212
c7 <- grep(fixed("/"), c, invert=TRUE, value = TRUE) # returns 212
a) There can be 3-8 digits after the forward slash. But there can only be 4 digits before the forward slash.
b) Each sub-string is separated by a semicolon delimiter.
c) I want to replace those substrings that contain the forward slash with blank. So, my result should be c(";2001", "181;" ,"212").
Kindly let me know where am I making the mistake. Any suggestions are very much welcome. Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
由于前向斜线之前和之后的数字具有多个数字,您可以在第一种方法中使用
+
(1或更多)或*
(0及以上),以删除所有他们:As the numbers before and after the forward slash have multiple digits you could use
+
(1 or more) or*
(0 and more) in your first approach to remove all of them: