RODBC(有时)不读取数字

发布于 2024-11-02 14:58:22 字数 2585 浏览 0 评论 0原文

我正在使用 ROBDC 将 Excel 工作表读入 R。然而,我遇到了一些奇怪的结果。有时,重新启动 R 控制台会有所帮助,但我现在尝试了 3 次,并且以下错误(?)仍然存在。

Excel 工作表包含数字块,我想将其读入数组中。

这是我的 sqlQuery:

channel <- odbcConnectExcel(paste("~link to file~"))
as.matrix(sqlQuery(channel,paste("select F13,F14,F15 from \"",land,i,"$\"",sep="")))

这是(部分)结果:

32 NA             NA             NA   
33 NA             NA             "Pf7"
34 "2.079975e+02" "6.762166e+01" NA   
35 "1.721265e+02" "5.595971e+01" NA   
36 "1.373673e+02" "4.465921e+01" NA   
37 "1.200835e+02" "3.904013e+01" NA   
38 "1.411489e+02" "4.588864e+01" NA   
39 NA             NA             NA   
40 NA             NA             NA   
41 NA             NA             NA   
42 NA             NA             "Pf7"
43 "1.443768e+00" "4.693807e-01" NA   
44 "1.194778e+00" "3.884318e-01" NA   
45 "9.535044e-01" "3.099920e-01" NA   
46 "8.335332e-01" "2.709883e-01" NA   
47 "9.797535e-01" "3.185257e-01" NA   
48 NA             NA             NA   
49 NA             NA             NA   

在 Excel 工作表中,每个数字块前面都有 Pf5:Pf7,每个描述 Pf5:Pf7 后面都跟有 5 个数字。所有单元格均采用数字格式。 Excel 表格可以在此处下载。

在 F13 和 F14 列中,RODBC 删除了描述(这很奇怪,但对我来说没问题),在 F15 列中它删除了数字(这对我来说根本不合适!)

奇怪的是,完全相同的代码运行得很好昨天,我没有更新任何 R 文件或以任何方式更改 Excel 工作表,所以我对可能导致错误的原因感到困惑。

我很感激任何可能有帮助的评论。

谢谢, 马丁

更新: 我尝试使用 odbcConnectExcel2007 而不是 odbcConnectExcel,结果如下:

22 NA             NA             NA             
23 NA             NA             NA             
24 NA             NA             "Pf7"          
25 "7.730711e+01" "1.958429e+01" "12,638675305" 
26 "4.126044e+01" "1.045255e+01" "6,7455279987" 
27 "4.293419e+01" "1.087656e+01" "7,0191635751" 
28 "4.278764e+01" "1.083944e+01" "6,9952057822" 
29 "4.073317e+01" "1.031897e+01" "6,6593269202" 
30 NA             NA             NA             
31 NA             NA             NA             
32 NA             NA             NA             
33 NA             NA             "Pf7"          
34 "9.001111e+01" "2.280260e+01" "14,715610176" 
35 "4.804083e+01" "1.217023e+01" "7,854032014"  
36 "4.998963e+01" "1.266392e+01" "8,1726345871" 
37 "4.981901e+01" "1.262070e+01" "8,1447397695" 
38 "4.742692e+01" "1.201471e+01" "7,7536653665" 
39 NA             NA             NA             
40 NA             NA             NA             

“,”而不是“。”可能是我使用德语版 Excel 造成的。我可以使用 sub() 来替换逗号,但这 (1) 不是一个优雅的解决方案,并且 (2) 没有解释为什么它适用于前两列,但不适用于第三列。 (也不是为什么它昨天有效但今天不起作用......)

I am reading Excel sheets into R using ROBDC. However, I encountered some strange results. Sometimes, restarting the R console will help, but I tried 3 times now, and the following bug (?) persists.

The Excel sheet contains blocks of numbers, which I want read into an array.

This is my sqlQuery:

channel <- odbcConnectExcel(paste("~link to file~"))
as.matrix(sqlQuery(channel,paste("select F13,F14,F15 from \"",land,i,"$\"",sep="")))

Here is (a part of) the result:

32 NA             NA             NA   
33 NA             NA             "Pf7"
34 "2.079975e+02" "6.762166e+01" NA   
35 "1.721265e+02" "5.595971e+01" NA   
36 "1.373673e+02" "4.465921e+01" NA   
37 "1.200835e+02" "3.904013e+01" NA   
38 "1.411489e+02" "4.588864e+01" NA   
39 NA             NA             NA   
40 NA             NA             NA   
41 NA             NA             NA   
42 NA             NA             "Pf7"
43 "1.443768e+00" "4.693807e-01" NA   
44 "1.194778e+00" "3.884318e-01" NA   
45 "9.535044e-01" "3.099920e-01" NA   
46 "8.335332e-01" "2.709883e-01" NA   
47 "9.797535e-01" "3.185257e-01" NA   
48 NA             NA             NA   
49 NA             NA             NA   

In the Excel sheet, every block of numbers is preceded by Pf5:Pf7, and every description Pf5:Pf7 is followed by 5 numbers. All cells are formatted as number. The Excel sheet can be downloaded here.

In columns F13 and F14, RODBC drops the descriptions (which is strange but fine with me), in column F15 it drops the numbers (which is not fine with me at all!)

The strange thing is that exactly the same code worked just fine yesterday, and I did not update any R files or change the Excel sheets in any way, so I am puzzled regarding what might cause the error.

I am grateful for any comments that might help.

Thanks,
Martin

UPDATE:
I tried using odbcConnectExcel2007 instead of odbcConnectExcel, here are the results:

22 NA             NA             NA             
23 NA             NA             NA             
24 NA             NA             "Pf7"          
25 "7.730711e+01" "1.958429e+01" "12,638675305" 
26 "4.126044e+01" "1.045255e+01" "6,7455279987" 
27 "4.293419e+01" "1.087656e+01" "7,0191635751" 
28 "4.278764e+01" "1.083944e+01" "6,9952057822" 
29 "4.073317e+01" "1.031897e+01" "6,6593269202" 
30 NA             NA             NA             
31 NA             NA             NA             
32 NA             NA             NA             
33 NA             NA             "Pf7"          
34 "9.001111e+01" "2.280260e+01" "14,715610176" 
35 "4.804083e+01" "1.217023e+01" "7,854032014"  
36 "4.998963e+01" "1.266392e+01" "8,1726345871" 
37 "4.981901e+01" "1.262070e+01" "8,1447397695" 
38 "4.742692e+01" "1.201471e+01" "7,7536653665" 
39 NA             NA             NA             
40 NA             NA             NA             

The "," instead of the "." could be caused by me using a German version of Excel. I could use sub() to replace the commas, but this (1) would not be an elegant solution, and (2) does not explain why it works with the first two cols but not with the third. (And also not why it worked yesterday but not today...)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

征棹 2024-11-09 14:58:22

查看 Excel 文件。单元格的内容实际上是数字吗? (如果您查看公式栏,类似 '123 的内容会显示单元格内容是字符串,而不是数字。)

您还可以考虑使用,例如 read.xls< /code> 在 xlsReadWrite 包中。这比使用 SQL 查询要简单一些。 (同样,对于 Excel 2007 及以上版本,xlsx 包中的 read.xlsx。)

Take a look in the Excel file. Are the contents of the cell actually numbers? (If you look in the formula bar, something like '123 reveals the cell contents to be a string, not a number.)

You could also look into using, e.g., read.xls in the xlsReadWrite package. It's a little simpler than having to use SQL queries. (Likewise, read.xlsx in the xlsx package for Excel 2007 onwards.)

奶茶白久 2024-11-09 14:58:22

不要将 as.matrix 用于可能具有不同列数据类型的数据框。

您的代码不是独立的,但是如果您将代码分解为可理解和可重现的内容,那么发生的事情应该非常明显。

使用问题中提供的 zip 中的文件“test2.xls”:

library(RODBC)
d <- sqlQuery(con, "SELECT [F13], [F14], [F15] FROM [Wien2008$]")
con <- odbcConnectExcel("test2.xls")
summary(d)
    F13                 F14              F15    
Min.   :6.584e-01   Min.   :1.668e-01   Pf7 : 6  
1st Qu.:4.073e+01   1st Qu.:5.744e+00   NA's:61  
Median :7.731e+01   Median :1.262e+01            
Mean   :2.108e+03   Mean   :1.027e+04            
3rd Qu.:2.322e+02   3rd Qu.:4.738e+01            
Max.   :5.697e+04   Max.   :2.791e+05            
NA's   :3.000e+01   NA's   :3.400e+01            

F13F14 对我来说看起来像数字。

sapply(d, class)
     F13       F14       F15 
"numeric" "numeric"  "factor" 

如果我们用 as.matrix 丢弃所有结构会发生什么?

mode(as.matrix(d))
[1] "character"

如果我们对任何旧数据框执行此操作会发生什么?

mode(as.matrix(data.frame(x = rnorm(10), y = letters[1:10], stringsAsFactors = FALSE)))
[1] "character"
mode(as.matrix(data.frame(x = rnorm(10), y = factor(letters[1:10]))))
[1] "character"
mode(as.matrix(data.frame(x = rnorm(10), y = 1:10)))
[1] "numeric"

如果存在数字和字符或因子列,则矩阵将被强制转换为字符。

这与 RODBC 完全无关。

Do not use as.matrix for a dataframe that may have different column data types.

Your code is not self-contained, but it should be pretty obvious what is going on if you decompose your code into something understandable and reproducible.

Using the file "test2.xls" in the zip provided in the question:

library(RODBC)
d <- sqlQuery(con, "SELECT [F13], [F14], [F15] FROM [Wien2008$]")
con <- odbcConnectExcel("test2.xls")
summary(d)
    F13                 F14              F15    
Min.   :6.584e-01   Min.   :1.668e-01   Pf7 : 6  
1st Qu.:4.073e+01   1st Qu.:5.744e+00   NA's:61  
Median :7.731e+01   Median :1.262e+01            
Mean   :2.108e+03   Mean   :1.027e+04            
3rd Qu.:2.322e+02   3rd Qu.:4.738e+01            
Max.   :5.697e+04   Max.   :2.791e+05            
NA's   :3.000e+01   NA's   :3.400e+01            

F13 and F14 look like numbers to me.

sapply(d, class)
     F13       F14       F15 
"numeric" "numeric"  "factor" 

What happens if we trash all that structure with as.matrix?

mode(as.matrix(d))
[1] "character"

What happens if we do that to any old dataframe?

mode(as.matrix(data.frame(x = rnorm(10), y = letters[1:10], stringsAsFactors = FALSE)))
[1] "character"
mode(as.matrix(data.frame(x = rnorm(10), y = factor(letters[1:10]))))
[1] "character"
mode(as.matrix(data.frame(x = rnorm(10), y = 1:10)))
[1] "numeric"

The matrix is coerced to character if there are columns of numeric and character or factor.

This has absolutely nothing to do with RODBC.

柳若烟 2024-11-09 14:58:22

我无法使用 R-2.13.0 和 RODBC_1.3-2 复制您的问题。

> channel <- odbcConnectExcel("C:/test2.xls")
> (foo <- sqlQuery(channel, 'select F13, F14, F15 from "Wien2008$"'))
            F13          F14        F15
22           NA           NA         NA
23           NA           NA         NA
24           NA           NA         NA
25 7.730711e+01 1.958429e+01 12.6386753
26 4.126044e+01 1.045255e+01  6.7455280
27 4.293419e+01 1.087656e+01  7.0191636
28 4.278764e+01 1.083944e+01  6.9952058
29 4.073317e+01 1.031897e+01  6.6593269
> sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RODBC_1.3-2

loaded via a namespace (and not attached):
[1] tools_2.13.0

由于我无法复制该问题,我不知道如何提供帮助...您说您没有更新任何 R 文件;这是否意味着您没有更改任何程序?您是否升级了 R 安装和/或升级了任何软件包?

I can't replicate your issue using R-2.13.0 and RODBC_1.3-2.

> channel <- odbcConnectExcel("C:/test2.xls")
> (foo <- sqlQuery(channel, 'select F13, F14, F15 from "Wien2008$"'))
            F13          F14        F15
22           NA           NA         NA
23           NA           NA         NA
24           NA           NA         NA
25 7.730711e+01 1.958429e+01 12.6386753
26 4.126044e+01 1.045255e+01  6.7455280
27 4.293419e+01 1.087656e+01  7.0191636
28 4.278764e+01 1.083944e+01  6.9952058
29 4.073317e+01 1.031897e+01  6.6593269
> sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RODBC_1.3-2

loaded via a namespace (and not attached):
[1] tools_2.13.0

Since I can't replicate the issue, I'm not sure how to help... You say you didn't update any R files; does that mean you didn't change any of your programs? Have you upgraded your R installation and/or upgraded any packages?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文