Google 文档导出带有逗号的电子表格值。 R 中的 read.csv() 将这些视为因素而不是数字
我是 R 新手,正在尝试将公共 Google 电子表格读取到带有数字列的 R 数据框中。我的问题似乎是导出的电子表格包含大量逗号,例如“13,061.422”。 read.csv() 函数将此视为一个因素。我尝试过 stringsAsFactors=FALSE 和 colClasses=c(rep("numeric",7)) 但都不起作用。有没有办法将带有逗号和小数的值强制转换为数值,无论是在 read.csv() 内还是之后当它们被视为 R 数据帧中的因子时?这是我的代码:
require(RCurl)
myCsv <- getURL("https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0Agbdciapt4QZdE95UDFoNHlyNnl6aGlqbGF0cDIzTlE&single=true&gid=0&range=A1%3AG4928&output=csv", ssl.verifypeer=FALSE) #ssl.verifypeer=FALSE gets around certificate issues I don't understand.
fullmatrix <- read.csv(textConnection(myCsv))
str(fullmatrix)
其结果是:
'data.frame': 4927 obs. of 7 variables:
$ wave. : Factor w/ 4927 levels "1,000.8900","1,002.8190",..: 4875 4874 4873 4872 4871 4870 4869 4868 4867 4866 ...
$ wavelength : Factor w/ 4927 levels "1,000.074","1,000.267",..: 1 2 3 4 5 6 7 8 9 10 ...
$ d2o : num 85.2 87.7 86.3 87.6 85.6 ...
$ di : num 54.3 55.8 54.9 55.6 54.9 ...
$ ddw : num 48.2 49.7 49.4 50.2 49.6 ...
$ ddw.old : num 53.3 55 53.9 54.8 53.7 ...
$ d2o.ddw.mix: num 65.8 67.9 67.2 68.4 66.8 ...
感谢您的帮助!我是 R 新手,所以猜测(希望)这是一个简单的!
I am new to R and am trying to read a public Google spreadsheet into an R data frame with numeric columns. My problem seems to be that the exported spreadsheet has commas in large numbers, such as "13,061.422". The read.csv() function treats this as a factor. I tried stringsAsFactors=FALSE and colClasses=c(rep("numeric",7)) but neither worked. Is there a way to coerce the values with commas and decimals to numeric values, either within read.csv() or afterwards when they are treated as Factors in the R dataframe? Here is my code:
require(RCurl)
myCsv <- getURL("https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0Agbdciapt4QZdE95UDFoNHlyNnl6aGlqbGF0cDIzTlE&single=true&gid=0&range=A1%3AG4928&output=csv", ssl.verifypeer=FALSE) #ssl.verifypeer=FALSE gets around certificate issues I don't understand.
fullmatrix <- read.csv(textConnection(myCsv))
str(fullmatrix)
which results in:
'data.frame': 4927 obs. of 7 variables:
$ wave. : Factor w/ 4927 levels "1,000.8900","1,002.8190",..: 4875 4874 4873 4872 4871 4870 4869 4868 4867 4866 ...
$ wavelength : Factor w/ 4927 levels "1,000.074","1,000.267",..: 1 2 3 4 5 6 7 8 9 10 ...
$ d2o : num 85.2 87.7 86.3 87.6 85.6 ...
$ di : num 54.3 55.8 54.9 55.6 54.9 ...
$ ddw : num 48.2 49.7 49.4 50.2 49.6 ...
$ ddw.old : num 53.3 55 53.9 54.8 53.7 ...
$ d2o.ddw.mix: num 65.8 67.9 67.2 68.4 66.8 ...
Thanks for any help! I am new to R, so guessing (hoping) this is an easy one!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
是的。两种方法。一开始最容易理解的可能就是
as.is=TRUE
将它们保留为字符向量,然后在转换之前使用gsub
删除逗号和任何货币符号为数字。第二个有点困难,但我认为更困难。为您正在使用的格式创建一个 as-方法。然后你可以使用colClasses
一步完成。我看到 @EDi 已经做了版本 #1(使用
stringsAsFactors
而不是as.is
,所以我将记录策略 #2:as-methods 是强制的。有很多这样的方法基本 R 中的方法,例如
as.list
、as.numeric
、as.character
在每种情况下,它们都尝试获取输入在一种模式下并制作一个例如,将矩阵强制转换为数据帧是有意义的,因为它们都有二维,将数据帧强制转换为矩阵的意义不大(但它确实会成功,但会丢失 )。列的所有属性并强制转换为通用模式。)在本例中,我将字符串作为输入,删除所有逗号,并将字符值强制为数字,然后使用 read.table。代码> 的 (在这种情况下,通过
read.csv
) 'colClasses' 参数分派到我用setAs
注册的 as-方法。 help(setAs) 页面了解更多详细信息。 S4等级制度让很多人感到困惑,包括我。这是我使用 S4 方法取得的唯一成功领域。Yes. Two methods. The easiest to understand at first is probably just to is
as.is=TRUE
to preserve them as character vectors and then usegsub
to remove the commas and any currency symbols before converting to numeric. The second is a bit more difficult, but I think more kewl. Create an as-method for the format you are using. Then you can usecolClasses
to do it in one step.I see @EDi already did version #1 (using
stringsAsFactors
rather thanas.is
, so I will document strategy #2:as-methods are coercive. There are many such methods in base R, such as
as.list
,as.numeric
,as.character
. In each case they attempt to take input that is in one mode and make a sensible copy of that in a different mode. For instance, it makes sense to coerce a matrix to a dataframe because they both have two dimensions. It makes a bit less sense to coerce a dataframe to a matrix (but it does succeed with loss of all the attributes of the columns and coercion to a common mode.)In the present case I am taking a character string as input, removing any commas, and coercing the character values to numeric. Then I use
read.table
's ( in this case by way ofread.csv
) 'colClasses' argument to dispatch to the as-method I registered withsetAs
. You may want to go to thehelp(setAs)
page for more details. The S4 class system confuses a lot of people, me included. This is about the only area of success I have had with S4 methods.使用
stringsAsFactors = FALSE
读取数据,删除逗号(使用gsub()
)并转换为数字(使用as.numeric()
) ):Read the data with
stringsAsFactors = FALSE
in, remove the commas (withgsub()
) and convert to numeric (withas.numeric()
):