从字符串中删除除选定字符之外的所有字符

发布于 2024-10-16 11:47:38 字数 628 浏览 5 评论 0原文

我想从字符串中删除所有非数字、减号或小数点的字符。

我使用 read.xls 从 Excel 导入数据，其中包含一些奇怪的字符。我需要将它们转换为数字。我对正则表达式不太熟悉，因此需要一种更简单的方法来执行以下操作：

excel_coords <- c(" 19.53380Ý°", " 20.02591°", "-155.91059°", "-155.8154Â°")
unwanted <- unique(unlist(strsplit(gsub("[0-9]|\\.|-", "", excel_coords), "")))
clean_coords <- gsub(do.call("paste", args = c(as.list(unwanted), sep="|")), 
                     replacement = "", x = excel_coords)

> clean_coords
[1] "19.53380"   "20.02591"   "-155.91059" "-155.8154"

如果有人能告诉我为什么这些字符出现在我的某些数据中，那就太好了（度数符号是原始 Excel 工作表的一部分，但其他符号是原始 Excel 工作表的一部分）不是）。

原文

I want to remove from a string all characters that are not digits, minus signs, or decimal points.

I imported data from Excel using read.xls, which include some strange characters. I need to convert these to numeric. I am not too familiar with regular expressions, so need a simpler way to do the following:

excel_coords <- c(" 19.53380Ý°", " 20.02591°", "-155.91059°", "-155.8154Â°")
unwanted <- unique(unlist(strsplit(gsub("[0-9]|\\.|-", "", excel_coords), "")))
clean_coords <- gsub(do.call("paste", args = c(as.list(unwanted), sep="|")), 
                     replacement = "", x = excel_coords)

> clean_coords
[1] "19.53380"   "20.02591"   "-155.91059" "-155.8154"

Bonus if somebody can tell me why these characters have appeared in some of my data (the degree signs are part of the original Excel worksheet, but the others are not).

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

半夏半凉 2024-10-23 11:47:38

简短而甜蜜。感谢 G. Grothendieck 的评论。

gsub("[^-.0-9]", "", excel_coords)

来自 http://stat.ethz.ch/ R-manual/R-patched/library/base/html/regex.html：“字符类是包含在 [ 和 ] 之间的字符列表，它匹配该列表中的任何单个字符；除非当它匹配任何不在列表中的字符时，列表是插入符号 ^。”

Short and sweet. Thanks to comment by G. Grothendieck.

gsub("[^-.0-9]", "", excel_coords)

From http://stat.ethz.ch/R-manual/R-patched/library/base/html/regex.html: "A character class is a list of characters enclosed between [ and ] which matches any single character in that list; unless the first character of the list is the caret ^, when it matches any character not in the list."

回复收藏 0 原文

北陌 2024-10-23 11:47:38

还可以通过使用 strsplit、sapply 和 paste 以及索引正确的字符而不是错误的字符来完成：

 excel_coords <- c(" 19.53380Ý°", " 20.02591°", "-155.91059°", "-155.8154Â°")
 correct_chars <- c(0:9,"-",".")
 sapply(strsplit(excel_coords,""), 
          function(x)paste(x[x%in%correct_chars],collapse=""))

[1] "19.53380"   "20.02591"   "-155.91059" "-155.8154"

Can also be done by using strsplit, sapply and paste and by indexing the correct characters rather than the wrong ones:

 excel_coords <- c(" 19.53380Ý°", " 20.02591°", "-155.91059°", "-155.8154Â°")
 correct_chars <- c(0:9,"-",".")
 sapply(strsplit(excel_coords,""), 
          function(x)paste(x[x%in%correct_chars],collapse=""))

[1] "19.53380"   "20.02591"   "-155.91059" "-155.8154"

回复收藏 0 原文

瞎闹 2024-10-23 11:47:38

gsub("(.+)([[:digit:]]+\\.[[:digit:]]+)(.+)", "\\2", excel_coords)
[1] "9.53380" "0.02591" "5.91059" "5.8154"

gsub("(.+)([[:digit:]]+\\.[[:digit:]]+)(.+)", "\\2", excel_coords)
[1] "9.53380" "0.02591" "5.91059" "5.8154"

回复收藏 0 原文

~没有更多了~

关于作者

╭ゆ眷念

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

从字符串中删除除选定字符之外的所有字符

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

从字符串中删除除选定字符之外的所有字符

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。