从字符串中删除除选定字符之外的所有字符
我想从字符串中删除所有非数字、减号或小数点的字符。
我使用 read.xls
从 Excel 导入数据,其中包含一些奇怪的字符。我需要将它们转换为数字。我对正则表达式不太熟悉,因此需要一种更简单的方法来执行以下操作:
excel_coords <- c(" 19.53380Ý°", " 20.02591°", "-155.91059°", "-155.8154°")
unwanted <- unique(unlist(strsplit(gsub("[0-9]|\\.|-", "", excel_coords), "")))
clean_coords <- gsub(do.call("paste", args = c(as.list(unwanted), sep="|")),
replacement = "", x = excel_coords)
> clean_coords
[1] "19.53380" "20.02591" "-155.91059" "-155.8154"
如果有人能告诉我为什么这些字符出现在我的某些数据中,那就太好了(度数符号是原始 Excel 工作表的一部分,但其他符号是原始 Excel 工作表的一部分)不是)。
I want to remove from a string all characters that are not digits, minus signs, or decimal points.
I imported data from Excel using read.xls
, which include some strange characters. I need to convert these to numeric. I am not too familiar with regular expressions, so need a simpler way to do the following:
excel_coords <- c(" 19.53380Ý°", " 20.02591°", "-155.91059°", "-155.8154°")
unwanted <- unique(unlist(strsplit(gsub("[0-9]|\\.|-", "", excel_coords), "")))
clean_coords <- gsub(do.call("paste", args = c(as.list(unwanted), sep="|")),
replacement = "", x = excel_coords)
> clean_coords
[1] "19.53380" "20.02591" "-155.91059" "-155.8154"
Bonus if somebody can tell me why these characters have appeared in some of my data (the degree signs are part of the original Excel worksheet, but the others are not).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
简短而甜蜜。感谢 G. Grothendieck 的评论。
来自 http://stat.ethz.ch/ R-manual/R-patched/library/base/html/regex.html:“字符类是包含在 [ 和 ] 之间的字符列表,它匹配该列表中的任何单个字符;除非当它匹配任何不在列表中的字符时,列表是插入符号 ^。”
Short and sweet. Thanks to comment by G. Grothendieck.
From http://stat.ethz.ch/R-manual/R-patched/library/base/html/regex.html: "A character class is a list of characters enclosed between [ and ] which matches any single character in that list; unless the first character of the list is the caret ^, when it matches any character not in the list."
还可以通过使用
strsplit
、sapply
和paste
以及索引正确的字符而不是错误的字符来完成:Can also be done by using
strsplit
,sapply
andpaste
and by indexing the correct characters rather than the wrong ones: