Google 文档导出带有逗号的电子表格值。 R 中的 read.csv() 将这些视为因素而不是数字

发布于 2024-12-22 21:16:15 字数 1235 浏览 1 评论 0原文

我是 R 新手,正在尝试将公共 Google 电子表格读取到带有数字列的 R 数据框中。我的问题似乎是导出的电子表格包含大量逗号,例如“13,061.422”。 read.csv() 函数将此视为一个因素。我尝试过 stringsAsFactors=FALSE 和 colClasses=c(rep("numeric",7)) 但都不起作用。有没有办法将带有逗号和小数的值强制转换为数值,无论是在 read.csv() 内还是之后当它们被视为 R 数据帧中的因子时?这是我的代码:

require(RCurl)

myCsv <- getURL("https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0Agbdciapt4QZdE95UDFoNHlyNnl6aGlqbGF0cDIzTlE&single=true&gid=0&range=A1%3AG4928&output=csv", ssl.verifypeer=FALSE)  #ssl.verifypeer=FALSE gets around certificate issues I don't understand.

fullmatrix <- read.csv(textConnection(myCsv))

str(fullmatrix)

其结果是:

'data.frame':   4927 obs. of  7 variables:
 $ wave.      : Factor w/ 4927 levels "1,000.8900","1,002.8190",..: 4875 4874 4873 4872 4871 4870 4869 4868 4867 4866 ...
 $ wavelength : Factor w/ 4927 levels "1,000.074","1,000.267",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ d2o        : num  85.2 87.7 86.3 87.6 85.6 ...
 $ di         : num  54.3 55.8 54.9 55.6 54.9 ...
 $ ddw        : num  48.2 49.7 49.4 50.2 49.6 ...
 $ ddw.old    : num  53.3 55 53.9 54.8 53.7 ...
 $ d2o.ddw.mix: num  65.8 67.9 67.2 68.4 66.8 ...

感谢您的帮助!我是 R 新手,所以猜测(希望)这是一个简单的!

I am new to R and am trying to read a public Google spreadsheet into an R data frame with numeric columns. My problem seems to be that the exported spreadsheet has commas in large numbers, such as "13,061.422". The read.csv() function treats this as a factor. I tried stringsAsFactors=FALSE and colClasses=c(rep("numeric",7)) but neither worked. Is there a way to coerce the values with commas and decimals to numeric values, either within read.csv() or afterwards when they are treated as Factors in the R dataframe? Here is my code:

require(RCurl)

myCsv <- getURL("https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0Agbdciapt4QZdE95UDFoNHlyNnl6aGlqbGF0cDIzTlE&single=true&gid=0&range=A1%3AG4928&output=csv", ssl.verifypeer=FALSE)  #ssl.verifypeer=FALSE gets around certificate issues I don't understand.

fullmatrix <- read.csv(textConnection(myCsv))

str(fullmatrix)

which results in:

'data.frame':   4927 obs. of  7 variables:
 $ wave.      : Factor w/ 4927 levels "1,000.8900","1,002.8190",..: 4875 4874 4873 4872 4871 4870 4869 4868 4867 4866 ...
 $ wavelength : Factor w/ 4927 levels "1,000.074","1,000.267",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ d2o        : num  85.2 87.7 86.3 87.6 85.6 ...
 $ di         : num  54.3 55.8 54.9 55.6 54.9 ...
 $ ddw        : num  48.2 49.7 49.4 50.2 49.6 ...
 $ ddw.old    : num  53.3 55 53.9 54.8 53.7 ...
 $ d2o.ddw.mix: num  65.8 67.9 67.2 68.4 66.8 ...

Thanks for any help! I am new to R, so guessing (hoping) this is an easy one!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

-小熊_ 2024-12-29 21:16:15

是的。两种方法。一开始最容易理解的可能就是 as.is=TRUE 将它们保留为字符向量,然后在转换之前使用 gsub 删除逗号和任何货币符号为数字。第二个有点困难,但我认为更困难。为您正在使用的格式创建一个 as-方法。然后你可以使用colClasses一步完成。

我看到 @EDi 已经做了版本 #1(使用 stringsAsFactors 而不是 as.is,所以我将记录策略 #2:

 library(methods)
 setClass("num.with.commas")
#[1] "num.with.commas"
 setAs("character", "num.with.commas",
      function(from) as.numeric(gsub(",", "", from)))
 require(RCurl)
#Loading required package: RCurl
#Loading required package: bitops

 myCsv <- getURL("https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0Agbdciapt4QZdE95UDFoNHlyNnl6aGlqbGF0cDIzTlE&single=true&gid=0&range=A1%3AG4928&output=csv", ssl.verifypeer=FALSE)  
> fullmatrix <- read.csv(textConnection(myCsv), 
                       colClasses=c(rep("num.with.commas",2), rep("numeric",4) ))
 str(fullmatrix)
#--------------
'data.frame':   4927 obs. of  7 variables:
 $ wave.      : num  9999 9997 9995 9993 9992 ...
 $ wavelength : num  1000 1000 1000 1001 1001 ...
 $ d2o        : num  85.2 87.7 86.3 87.6 85.6 ...
 $ di         : num  54.3 55.8 54.9 55.6 54.9 ...
 $ ddw        : num  48.2 49.7 49.4 50.2 49.6 ...
 $ ddw.old    : num  53.3 55 53.9 54.8 53.7 ...
 $ d2o.ddw.mix: num  65.8 67.9 67.2 68.4 66.8 ...

as-methods 是强制的。有很多这样的方法基本 R 中的方法,例如 as.listas.numericas.character 在每种情况下,它们都尝试获取输入在一种模式下并制作一个例如,将矩阵强制转换为数据帧是有意义的,因为它们都有二维,将数据帧强制转换为矩阵的意义不大(但它确实会成功,但会丢失 )。列的所有属性并强制转换为通用模式。)

在本例中,我将字符串作为输入,删除所有逗号,并将字符值强制为数字,然后使用 read.table。代码> 的 (在这种情况下,通过 read.csv) 'colClasses' 参数分派到我用 setAs 注册的 as-方法。 help(setAs) 页面了解更多详细信息。 S4等级制度让很多人感到困惑,包括我。这是我使用 S4 方法取得的唯一成功领域。

Yes. Two methods. The easiest to understand at first is probably just to is as.is=TRUE to preserve them as character vectors and then use gsub to remove the commas and any currency symbols before converting to numeric. The second is a bit more difficult, but I think more kewl. Create an as-method for the format you are using. Then you can use colClasses to do it in one step.

I see @EDi already did version #1 (using stringsAsFactors rather than as.is, so I will document strategy #2:

 library(methods)
 setClass("num.with.commas")
#[1] "num.with.commas"
 setAs("character", "num.with.commas",
      function(from) as.numeric(gsub(",", "", from)))
 require(RCurl)
#Loading required package: RCurl
#Loading required package: bitops

 myCsv <- getURL("https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0Agbdciapt4QZdE95UDFoNHlyNnl6aGlqbGF0cDIzTlE&single=true&gid=0&range=A1%3AG4928&output=csv", ssl.verifypeer=FALSE)  
> fullmatrix <- read.csv(textConnection(myCsv), 
                       colClasses=c(rep("num.with.commas",2), rep("numeric",4) ))
 str(fullmatrix)
#--------------
'data.frame':   4927 obs. of  7 variables:
 $ wave.      : num  9999 9997 9995 9993 9992 ...
 $ wavelength : num  1000 1000 1000 1001 1001 ...
 $ d2o        : num  85.2 87.7 86.3 87.6 85.6 ...
 $ di         : num  54.3 55.8 54.9 55.6 54.9 ...
 $ ddw        : num  48.2 49.7 49.4 50.2 49.6 ...
 $ ddw.old    : num  53.3 55 53.9 54.8 53.7 ...
 $ d2o.ddw.mix: num  65.8 67.9 67.2 68.4 66.8 ...

as-methods are coercive. There are many such methods in base R, such as as.list, as.numeric, as.character. In each case they attempt to take input that is in one mode and make a sensible copy of that in a different mode. For instance, it makes sense to coerce a matrix to a dataframe because they both have two dimensions. It makes a bit less sense to coerce a dataframe to a matrix (but it does succeed with loss of all the attributes of the columns and coercion to a common mode.)

In the present case I am taking a character string as input, removing any commas, and coercing the character values to numeric. Then I use read.table's ( in this case by way of read.csv) 'colClasses' argument to dispatch to the as-method I registered with setAs. You may want to go to the help(setAs) page for more details. The S4 class system confuses a lot of people, me included. This is about the only area of success I have had with S4 methods.

寂寞笑我太脆弱 2024-12-29 21:16:15

使用stringsAsFactors = FALSE读取数据,删除逗号(使用gsub())并转换为数字(使用as.numeric()) ):

> fullmatrix <- read.csv(textConnection(myCsv), stringsAsFactors = FALSE)

> str(fullmatrix)
'data.frame':   4927 obs. of  7 variables:
 $ wave.      : chr  "9,999.2590" "9,997.3300" "9,995.4010" "9,993.4730" ...
 $ wavelength : chr  "1,000.07410549122" "1,000.26707130804" "1,000.46011160533" "1,000.65312629553" ...
 $ d2o        : num  85.2 87.7 86.3 87.6 85.6 ...
 $ di         : num  54.3 55.8 54.9 55.6 54.9 ...
 $ ddw        : num  48.2 49.7 49.4 50.2 49.6 ...
 $ ddw.old    : num  53.3 55 53.9 54.8 53.7 ...
 $ d2o.ddw.mix: num  65.8 67.9 67.2 68.4 66.8 ...

> fullmatrix$wave. <- as.numeric(gsub(",", "", fullmatrix$wave.)) 
> fullmatrix$wavelength <- as.numeric(gsub(",", "", fullmatrix$wavelength))

> str(fullmatrix)
'data.frame':   4927 obs. of  7 variables:
 $ wave.      : num  9999 9997 9995 9993 9992 ...
 $ wavelength : num  1000 1000 1000 1001 1001 ...
 $ d2o        : num  85.2 87.7 86.3 87.6 85.6 ...
 $ di         : num  54.3 55.8 54.9 55.6 54.9 ...
 $ ddw        : num  48.2 49.7 49.4 50.2 49.6 ...
 $ ddw.old    : num  53.3 55 53.9 54.8 53.7 ...
 $ d2o.ddw.mix: num  65.8 67.9 67.2 68.4 66.8 ...

> fullmatrix[1, 1]
[1] 9999.259

Read the data with stringsAsFactors = FALSE in, remove the commas (with gsub()) and convert to numeric (with as.numeric()):

> fullmatrix <- read.csv(textConnection(myCsv), stringsAsFactors = FALSE)

> str(fullmatrix)
'data.frame':   4927 obs. of  7 variables:
 $ wave.      : chr  "9,999.2590" "9,997.3300" "9,995.4010" "9,993.4730" ...
 $ wavelength : chr  "1,000.07410549122" "1,000.26707130804" "1,000.46011160533" "1,000.65312629553" ...
 $ d2o        : num  85.2 87.7 86.3 87.6 85.6 ...
 $ di         : num  54.3 55.8 54.9 55.6 54.9 ...
 $ ddw        : num  48.2 49.7 49.4 50.2 49.6 ...
 $ ddw.old    : num  53.3 55 53.9 54.8 53.7 ...
 $ d2o.ddw.mix: num  65.8 67.9 67.2 68.4 66.8 ...

> fullmatrix$wave. <- as.numeric(gsub(",", "", fullmatrix$wave.)) 
> fullmatrix$wavelength <- as.numeric(gsub(",", "", fullmatrix$wavelength))

> str(fullmatrix)
'data.frame':   4927 obs. of  7 variables:
 $ wave.      : num  9999 9997 9995 9993 9992 ...
 $ wavelength : num  1000 1000 1000 1001 1001 ...
 $ d2o        : num  85.2 87.7 86.3 87.6 85.6 ...
 $ di         : num  54.3 55.8 54.9 55.6 54.9 ...
 $ ddw        : num  48.2 49.7 49.4 50.2 49.6 ...
 $ ddw.old    : num  53.3 55 53.9 54.8 53.7 ...
 $ d2o.ddw.mix: num  65.8 67.9 67.2 68.4 66.8 ...

> fullmatrix[1, 1]
[1] 9999.259
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文