R 与 RNetCDF 的结果不一致 - 为什么?
我在使用 RNetCDF 从 NetCDF 数据文件中提取数据时遇到问题。每个数据文件都有 3 个维度(经度、纬度和日期)和 3 个变量(纬度、经度和气候变量)。有四个数据集,每个数据集都有不同的气候变量。
为了清楚起见,以下是 print.nc(p8m.tmax) 的一些输出。除了特定的气候变量外,其他数据集是相同的。
dimensions:
month = UNLIMITED ; // (1368 currently)
lat = 3105 ;
lon = 7025 ;
variables:
float lat(lat) ;
lat:long_name = "latitude" ;
lat:standard_name = "latitude" ;
lat:units = "degrees_north" ;
float lon(lon) ;
lon:long_name = "longitude" ;
lon:standard_name = "longitude" ;
lon:units = "degrees_east" ;
short tmax(lon, lat, month) ;
tmax:missing_value = -9999 ;
tmax:_FillValue = -9999 ;
tmax:units = "degree_celsius" ;
tmax:scale_factor = 0.01 ;
tmax:valid_min = -5000 ;
tmax:valid_max = 6000 ;
当我使用 RNetCDF 包中的 var.get.nc 函数时,我遇到了我不明白的行为。
例如,当我尝试从最高温度数据 (p8m.tmax <- open.nc(tmaxdataset.nc)) 中提取从 stval 开始的 82 个值
> var.get.nc(p8m.tmax,'tmax', start=c(lon_val, lat_val, stval),count=c(1,1,82))
(其中 lon_val 和 lat_val 指定坐标 I 的数据集中的位置)我感兴趣并且 stval 是 stval 设置为 which(time_vec==200201),在本例中等于 1285。)我得到 错误:参数无效
但在成功提取 80 和 81 值后,
> var.get.nc(p8m.tmax,'tmax', start=c(lon_val, lat_val, stval),count=c(1,1,80))
> var.get.nc(p8m.tmax,'tmax', start=c(lon_val, lat_val, stval),count=c(1,1,81))
使用 82 的命令有效:
> var.get.nc(p8m.tmax,'tmax', start=c(lon_val, lat_val, stval),count=c(1,1,82))
[1] 444 866 1063 ... [output snipped]
在结构相同的 tmin 文件中出现相同的问题,但在 36 而不是 82:
> var.get.nc(p8m.tmin,'tmin', start=c(lon_val, lat_val, stval),count=c(1,1,36))
产生错误:参数无效
但在重复计数 30、31 后等
> var.get.nc(p8m.tmin,'tmin', start=c(lon_val, lat_val, stval), count=c(1,1,36))
作品。
这些示例看起来函数在最后一次计数时失败了,但实际情况并非如此。在第一个示例中,在我请求 84 个值后,var.get.nc 给出了 Error: Invalid argument。然后,我通过改变数据集中的起点并一次仅要求 1 个值,将失败范围缩小到第 82 个计数。出现问题的具体数字也各不相同。我可以关闭并重新打开数据集,并使问题发生在不同的位置。
在上面的特定示例中,lon_val 和 lat_val 分别为 1595 和 1751,标识数据集中沿纬度和经度维度的位置,以获取我感兴趣的纬度和经度。第 1595 个纬度和第 1751 个经度不是问题,然而。我尝试过的所有其他纬度和经度都会出现此问题。
沿气候变量维度 (stval) 改变数据集中的起始位置和/或指定不同的起始位置(作为命令中的数字而不是对象 stval)也无法解决问题。
这个问题并不总是发生。我可以连续运行相同的代码三次(清除运行之间的所有对象),并且每次都会得到不同的结果。第一次运行可能会在我试图获取的第 7 个条目上阻塞,第二次运行可能会正常工作,而第三次运行可能会在第 83 个条目上阻塞。我对这种不一致的行为感到非常困惑。
open.nc 函数也开始失败,并出现相同的错误:参数无效。与 var.get.nc 问题一样,它的发生也不一致。
有谁知道是什么原因导致最初无法提取变量?我该如何预防呢?可能与数据文件的大小(每个大约 60GB)和/或我通过网络驱动器访问它们的事实有关?
这里也有人问过这个问题: https://stat.ethz.ch/pipermail/r-help /2011-6月/281233.html
> sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: i386-pc-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] reshape_0.8.4 plyr_1.5.2 RNetCDF_1.5.2-2
loaded via a namespace (and not attached):
[1] tools_2.13.0
I am having trouble extracting data from NetCDF data files using RNetCDF. The data files each have 3 dimensions (longitude, latitude, and a date) and 3 variables (latitude, longitude, and a climate variable). There are four datasets, each with a different climate variable.
Here is some of the output from print.nc(p8m.tmax) for clarity. The other datasets are identical except for the specific climate variable.
dimensions:
month = UNLIMITED ; // (1368 currently)
lat = 3105 ;
lon = 7025 ;
variables:
float lat(lat) ;
lat:long_name = "latitude" ;
lat:standard_name = "latitude" ;
lat:units = "degrees_north" ;
float lon(lon) ;
lon:long_name = "longitude" ;
lon:standard_name = "longitude" ;
lon:units = "degrees_east" ;
short tmax(lon, lat, month) ;
tmax:missing_value = -9999 ;
tmax:_FillValue = -9999 ;
tmax:units = "degree_celsius" ;
tmax:scale_factor = 0.01 ;
tmax:valid_min = -5000 ;
tmax:valid_max = 6000 ;
I am getting behavior I don't understand when I use the var.get.nc function from the RNetCDF package.
For example, when I attempt to extract 82 values beginning at stval from the maximum temperature data (p8m.tmax <- open.nc(tmaxdataset.nc)) with
> var.get.nc(p8m.tmax,'tmax', start=c(lon_val, lat_val, stval),count=c(1,1,82))
(where lon_val and lat_val specify the location in the dataset of the coordinates I'm interested in and stval is stval is set to which(time_vec==200201), which in this case equaled 1285.) I get
Error: Invalid argument
But after successfully extracting 80 and 81 values
> var.get.nc(p8m.tmax,'tmax', start=c(lon_val, lat_val, stval),count=c(1,1,80))
> var.get.nc(p8m.tmax,'tmax', start=c(lon_val, lat_val, stval),count=c(1,1,81))
the command with 82 works:
> var.get.nc(p8m.tmax,'tmax', start=c(lon_val, lat_val, stval),count=c(1,1,82))
[1] 444 866 1063 ... [output snipped]
The same problem occurs in the identically structured tmin file, but at 36 instead of 82:
> var.get.nc(p8m.tmin,'tmin', start=c(lon_val, lat_val, stval),count=c(1,1,36))
produces Error: Invalid argument
But after repeating with counts of 30, 31, etc
> var.get.nc(p8m.tmin,'tmin', start=c(lon_val, lat_val, stval), count=c(1,1,36))
works.
These examples make it seem like the function is failing at the last count, but that actually isn't the case. In the first example, var.get.nc gave Error: Invalid argument after I asked for 84 values. I then narrowed the failure down to the 82nd count by varying the starting point in the dataset and asking for only 1 value at a time. The particular number the problem occurs at also varies. I can close and reopen the dataset and have the problem occur at a different location.
In the particular examples above, lon_val and lat_val are 1595 and 1751, respectively, identifying the location in the dataset along the lat and lon dimensions for the latitude and longitude I'm interested in. The 1595th latitude and 1751th longitude are not the problem, however. The problem occurs with all other latitude and longitudes I've tried.
Varying the starting location in the dataset along the climate variable dimension (stval) and/or specifying it different (as a number in the command instead of the object stval) also does not fix the problem.
This problem doesn't always occur. I can run identical code three times in a row (clearing all objects in between runs) and get a different outcome each time. The first run may choke on the 7th entry I'm trying to get, the second might work fine, and the third run might choke on the 83rd entry. I'm absolutely baffled by such inconsistent behavior.
The open.nc function has also started to fail with the same Error: Invalid argument. Like the var.get.nc problems, it also occurs inconsistently.
Does anyone know what causes the initial failure to extract the variable? And how I might prevent it? Could have to do with the size of the data files (~60GB each) and/or the fact that I'm accessing them through networked drives?
This was also asked here:
https://stat.ethz.ch/pipermail/r-help/2011-June/281233.html
> sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: i386-pc-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] reshape_0.8.4 plyr_1.5.2 RNetCDF_1.5.2-2
loaded via a namespace (and not attached):
[1] tools_2.13.0
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
为了解决这个问题,我从RNetCDF包(版本1.5.2-2)切换到ncdf包(1.6.5)。这两个包中的函数名称相似,用途相同[open.nc 与 open.ncdf、var.get.nc 与 get.var.ncdf]。使用完全相同的代码,并将 RNetCDF 函数名称替换为 ncdf 函数,我没有得到任何错误和预期的结果。
因此,虽然以下 RNetCDF 命令失败(仅有时且没有明显原因),但
这些 ncdf 命令永远不会失败
这不是真正的解决方案 - 我仍然不知道为什么 RNetCDF 包中的功能有时起作用,有时不起作用。然而,它确实允许我提取我需要的数据,并且希望对其他在 R 中使用 netcdf 数据的人有一些用处。
To solve this problem, I switched from the RNetCDF package (version 1.5.2-2) to the ncdf package (1.6.5). The functions in the two packages are similarly named and have the same purposes [open.nc vs. open.ncdf, var.get.nc vs. get.var.ncdf]. Using the exact same code with the RNetCDF function names replaced with ncdf functions, I get no errors and the expected results.
So while the following RNetCDF commands fail (only sometimes & for no apparent reason)
These ncdf commands never fail
This is not a real solution - I still don't know why the functions in the RNetCDF package sometimes work and sometimes do not. However, it does allow me to extract the data I need and will hopefully be of some use to others working with netcdf data in R.