在NetCDF文件中更改块块形状
我有几个约100 GB NetCDF文件。 在每个NetCDF文件中,都有一个变量a
,我必须从中提取几个数据系列 尺寸为(1440,721,6,8760)
。 我需要从每个NetCDF文件中提取〜20K dimension (1,1,1,8760)
。 由于提取一个切片(几分钟)非常慢,因此我阅读了如何优化过程。 最有可能的块不是最佳设置的。 因此,我的目标是将块大小更改为(1,1,1,8760)
,以进行更有效的I/O。
但是,我很难理解如何最好地重新构造这个NetCDF变量。 首先,通过运行ncdump -k file.nc
,我发现该类型为64位偏移
。 根据我的研究,我认为这是NetCDF3,不支持定义块大小。 因此,我使用nccopy -k 3 source.nc dest.nc
将其复制到NetCDF4格式。 ncdump -k file.nc
现在返回netcdf -4
。 但是,现在我被困了。我不知道该如何进行。
如果任何人在Python,Matlab或使用NCCOPY中都有适当的解决方案,请分享。 我现在正在尝试的是:
nccopy -k 3 -w -c latitude/1,longitude/1,level/1,time/8760 source.nc dest.nc
这是理论上正确的方法吗? 不幸的是,在24小时后,它仍未在有效的服务器上完成,而RAM(250GB)和许多CPU(80)都没有完成。
I have several ~100 GB NetCDF files.
Within each NetCDF file, there is a variable a
, from which I have to extract several data series
The dimension is (1440,721,6,8760)
.
I need to extract ~20k slices of dimension (1,1,1,8760)
from each NetCDF file.
Since it is extremely slow to extract one slice (several minutes), I read about how to optimize the process.
Most likely, the chunks are not set optimally.
Therefore, my goal is to change the chunk size to (1,1,1,8760)
for a more efficient I/O.
However, I struggle to understand how I can best re-chunk this NetCDF variable.
First of all, by running ncdump -k file.nc
, I found that the type is 64-bit offset
.
Based on my research, I think this is NetCDF3 which does not support defining chunk sizes.
Therefore, I copied it to NetCDF4 format using nccopy -k 3 source.nc dest.nc
.ncdump -k file.nc
now returns netCDF-4
.
However, now I'm stuck. I do not know how to proceed.
If anybody has a proper solution in python, matlab, or using nccopy, please share it.
What I'm trying now is the following:
nccopy -k 3 -w -c latitude/1,longitude/1,level/1,time/8760 source.nc dest.nc
Is this the correct approach in theory?
Unfortunately, after 24 hours, it still did not finish on a potent server with more then enough RAM (250GB) and many CPUs (80).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您的命令似乎是正确的。重新结合需要时间。
看看这是否更快。
Your command appears to be correct. Re-chunking takes time.
to see if that is any faster.
这里的一个古老问题,但我在其他地方没有找到更多信息。我必须将大小为25 GB(解压缩)的尺寸恢复文件,并尝试了NCK和NCCOPY。我认为NCCOPY实际上工作了,但花了大约48小时。
我最终使用了Xarray和Python,结果速度更快。在不到5分钟的时间内将25 GB的重新包装(包括压缩)。记忆使用率约为50 GB。这是我使用过的方法:
An old question here, but I didn't find much more information elsewhere. I have to rechunk files of size 25 GB (unzipped) and tried both ncks and nccopy. I think nccopy actually worked but took about 48 hours.
I ended up using xarray and python and it turned out to be much faster. The 25 GB were rechunked in less than 5 Minutes (incl. compression). Memory usage was about 50 GB. Here is the approach I have used: