从字符串中删除除选定字符之外的所有字符

发布于 2024-10-16 11:47:38 字数 628 浏览 5 评论 0原文

我想从字符串中删除所有非数字、减号或小数点的字符。

我使用 read.xls 从 Excel 导入数据,其中包含一些奇怪的字符。我需要将它们转换为数字。我对正则表达式不太熟悉,因此需要一种更简单的方法来执行以下操作:

excel_coords <- c(" 19.53380Ý°", " 20.02591°", "-155.91059°", "-155.8154°")
unwanted <- unique(unlist(strsplit(gsub("[0-9]|\\.|-", "", excel_coords), "")))
clean_coords <- gsub(do.call("paste", args = c(as.list(unwanted), sep="|")), 
                     replacement = "", x = excel_coords)

> clean_coords
[1] "19.53380"   "20.02591"   "-155.91059" "-155.8154" 

如果有人能告诉我为什么这些字符出现在我的某些数据中,那就太好了(度数符号是原始 Excel 工作表的一部分,但其他符号是原始 Excel 工作表的一部分)不是)。

I want to remove from a string all characters that are not digits, minus signs, or decimal points.

I imported data from Excel using read.xls, which include some strange characters. I need to convert these to numeric. I am not too familiar with regular expressions, so need a simpler way to do the following:

excel_coords <- c(" 19.53380Ý°", " 20.02591°", "-155.91059°", "-155.8154°")
unwanted <- unique(unlist(strsplit(gsub("[0-9]|\\.|-", "", excel_coords), "")))
clean_coords <- gsub(do.call("paste", args = c(as.list(unwanted), sep="|")), 
                     replacement = "", x = excel_coords)

> clean_coords
[1] "19.53380"   "20.02591"   "-155.91059" "-155.8154" 

Bonus if somebody can tell me why these characters have appeared in some of my data (the degree signs are part of the original Excel worksheet, but the others are not).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

半夏半凉 2024-10-23 11:47:38

简短而甜蜜。感谢 G. Grothendieck 的评论。

gsub("[^-.0-9]", "", excel_coords)

来自 http://stat.ethz.ch/ R-manual/R-patched/library/base/html/regex.html:“字符类是包含在 [ 和 ] 之间的字符列表,它匹配该列表中的任何单个字符;除非当它匹配任何不在列表中的字符时,列表是插入符号 ^。”

Short and sweet. Thanks to comment by G. Grothendieck.

gsub("[^-.0-9]", "", excel_coords)

From http://stat.ethz.ch/R-manual/R-patched/library/base/html/regex.html: "A character class is a list of characters enclosed between [ and ] which matches any single character in that list; unless the first character of the list is the caret ^, when it matches any character not in the list."

北陌 2024-10-23 11:47:38

还可以通过使用 strsplitsapplypaste 以及索引正确的字符而不是错误的字符来完成:

 excel_coords <- c(" 19.53380Ý°", " 20.02591°", "-155.91059°", "-155.8154°")
 correct_chars <- c(0:9,"-",".")
 sapply(strsplit(excel_coords,""), 
          function(x)paste(x[x%in%correct_chars],collapse=""))

[1] "19.53380"   "20.02591"   "-155.91059" "-155.8154" 

Can also be done by using strsplit, sapply and paste and by indexing the correct characters rather than the wrong ones:

 excel_coords <- c(" 19.53380Ý°", " 20.02591°", "-155.91059°", "-155.8154°")
 correct_chars <- c(0:9,"-",".")
 sapply(strsplit(excel_coords,""), 
          function(x)paste(x[x%in%correct_chars],collapse=""))

[1] "19.53380"   "20.02591"   "-155.91059" "-155.8154" 
瞎闹 2024-10-23 11:47:38
gsub("(.+)([[:digit:]]+\\.[[:digit:]]+)(.+)", "\\2", excel_coords)
[1] "9.53380" "0.02591" "5.91059" "5.8154" 
gsub("(.+)([[:digit:]]+\\.[[:digit:]]+)(.+)", "\\2", excel_coords)
[1] "9.53380" "0.02591" "5.91059" "5.8154" 
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文