导入包含逗号、千位分隔符和尾随减号的 CSV 数据
Mac OS X 上的 R 2.13.1。我正在尝试导入一个数据文件,该文件具有千位分隔符和逗号作为小数点,以及负值的尾随减号。
基本上,我试图从: 转换
"A|324,80|1.324,80|35,80-"
为
V1 V2 V3 V4
1 A 324.80 1324.8 -35.80
现在,以交互方式进行以下操作:
gsub("\\.","","1.324,80")
[1] "1324,80"
gsub("(.+)-$","-\\1", "35,80-")
[1] "-35,80"
并将它们组合起来:
gsub("\\.", "", gsub("(.+)-$","-\\1","1.324,80-"))
[1] "-1324,80"
但是,我无法从 read.data 中删除千位分隔符:
setClass("num.with.commas")
setAs("character", "num.with.commas", function(from) as.numeric(gsub("\\.", "", sub("(.+)-$","-\\1",from))) )
mydata <- "A|324,80|1.324,80|35,80-"
mytable <- read.table(textConnection(mydata), header=FALSE, quote="", comment.char="", sep="|", dec=",", skip=0, fill=FALSE,strip.white=TRUE, colClasses=c("character","num.with.commas", "num.with.commas", "num.with.commas"))
Warning messages:
1: In asMethod(object) : NAs introduced by coercion
2: In asMethod(object) : NAs introduced by coercion
3: In asMethod(object) : NAs introduced by coercion
mytable
V1 V2 V3 V4
1 A NA NA NA
请注意,如果我从“\\”更改”。到函数中的“,”,事情看起来有点不同:
setAs("character", "num.with.commas", function(from) as.numeric(gsub(",", "", sub("(.+)-$","-\\1",from))) )
mytable <- read.table(textConnection(mydata), header=FALSE, quote="", comment.char="", sep="|", dec=",", skip=0, fill=FALSE,strip.white=TRUE, colClasses=c("character","num.with.commas", "num.with.commas", "num.with.commas"))
mytable
V1 V2 V3 V4
1 A 32480 1.3248 -3580
我认为问题是 read.data 和 dec=“,”将传入的“,”转换为“。”在调用 as(from, "num.with.commas") 之前,以便输入字符串可以是例如“1.324.80”。
我希望 as("1.123,80-","num.with.commas") 返回 -1123.80 和 as("1.100.123,80", "num.with.commas") 返回 1100123.80。
如何让我的 num.with.commas 替换输入字符串中除最后一个小数点之外的所有小数点?
更新:首先,我添加了否定前瞻并让 as() 在控制台中工作:
setAs("character", "num.with.commas", function(from) as.numeric(gsub("(?!\\.\\d\\d$)\\.", "", gsub("(.+)-$","-\\1",from), perl=TRUE)) )
as("1.210.123.80-","num.with.commas")
[1] -1210124
as("10.123.80-","num.with.commas")
[1] -10123.8
as("10.123.80","num.with.commas")
[1] 10123.8
但是,read.table仍然存在相同的问题。在我的函数中添加一些 print() 表明 num.with.commas 实际上得到了逗号而不是要点。
所以我当前的解决方案是将“,”替换为“。”以 num.with.commas 表示。
setAs("character", "num.with.commas", function(from) as.numeric(gsub(",","\\.",gsub("(?!\\.\\d\\d$)\\.", "", gsub("(.+)-$","-\\1",from), perl=TRUE))) )
mytable <- read.table(textConnection(mydata), header=FALSE, quote="", comment.char="", sep="|", dec=",", skip=0, fill=FALSE,strip.white=TRUE, colClasses=c("character","num.with.commas", "num.with.commas", "num.with.commas"))
mytable
V1 V2 V3 V4
1 A 324.8 1101325 -35.8
R 2.13.1 on Mac OS X. I'm trying to import a data file that has a point for thousand separator and comma as the decimal point, as well as trailing minus for negative values.
Basically, I'm trying to convert from:
"A|324,80|1.324,80|35,80-"
to
V1 V2 V3 V4
1 A 324.80 1324.8 -35.80
Now, interactively both the following works:
gsub("\\.","","1.324,80")
[1] "1324,80"
gsub("(.+)-$","-\\1", "35,80-")
[1] "-35,80"
and also combining them:
gsub("\\.", "", gsub("(.+)-$","-\\1","1.324,80-"))
[1] "-1324,80"
However, I'm not able to remove the thousand separator from read.data:
setClass("num.with.commas")
setAs("character", "num.with.commas", function(from) as.numeric(gsub("\\.", "", sub("(.+)-$","-\\1",from))) )
mydata <- "A|324,80|1.324,80|35,80-"
mytable <- read.table(textConnection(mydata), header=FALSE, quote="", comment.char="", sep="|", dec=",", skip=0, fill=FALSE,strip.white=TRUE, colClasses=c("character","num.with.commas", "num.with.commas", "num.with.commas"))
Warning messages:
1: In asMethod(object) : NAs introduced by coercion
2: In asMethod(object) : NAs introduced by coercion
3: In asMethod(object) : NAs introduced by coercion
mytable
V1 V2 V3 V4
1 A NA NA NA
Note that if I change from "\\." to "," in the function, things look a bit different:
setAs("character", "num.with.commas", function(from) as.numeric(gsub(",", "", sub("(.+)-$","-\\1",from))) )
mytable <- read.table(textConnection(mydata), header=FALSE, quote="", comment.char="", sep="|", dec=",", skip=0, fill=FALSE,strip.white=TRUE, colClasses=c("character","num.with.commas", "num.with.commas", "num.with.commas"))
mytable
V1 V2 V3 V4
1 A 32480 1.3248 -3580
I think the problem is that read.data with dec="," converts the incoming "," to "." BEFORE calling as(from, "num.with.commas"), so that the input string can be e.g. "1.324.80".
I want as("1.123,80-","num.with.commas") to return -1123.80 and as("1.100.123,80", "num.with.commas") to return 1100123.80.
How can I make my num.with.commas replace all except the last decimal point in the input string?
Update: First, I added negative lookahead and got as() working in the console:
setAs("character", "num.with.commas", function(from) as.numeric(gsub("(?!\\.\\d\\d$)\\.", "", gsub("(.+)-$","-\\1",from), perl=TRUE)) )
as("1.210.123.80-","num.with.commas")
[1] -1210124
as("10.123.80-","num.with.commas")
[1] -10123.8
as("10.123.80","num.with.commas")
[1] 10123.8
However, read.table still had the same problem. Adding some print()s to my function showed that num.with.commas in fact got the comma and not the point.
So my current solution is to then replace from "," to "." in num.with.commas.
setAs("character", "num.with.commas", function(from) as.numeric(gsub(",","\\.",gsub("(?!\\.\\d\\d$)\\.", "", gsub("(.+)-$","-\\1",from), perl=TRUE))) )
mytable <- read.table(textConnection(mydata), header=FALSE, quote="", comment.char="", sep="|", dec=",", skip=0, fill=FALSE,strip.white=TRUE, colClasses=c("character","num.with.commas", "num.with.commas", "num.with.commas"))
mytable
V1 V2 V3 V4
1 A 324.8 1101325 -35.8
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您应该先删除所有句点,然后将逗号更改为小数点,然后再使用 as.numeric() 进行强制转换。您稍后可以使用 options(OutDec=",") 控制小数点的打印方式。我不认为 R 在内部使用逗号作为小数点分隔符,即使在传统的语言环境中也是如此。
You should be removing all the periods first and then changing the commas to decimal points before coercing with as.numeric(). You can later control how decimal points are printed with options(OutDec=",") . I do not think R uses commas as decimal separators internally even in locales where they are conventional.
这是使用正则表达式和替换的解决方案
Here's a solution with regular expressions and substitutions