在NetCDF文件中更改块块形状

发布于 2025-02-13 15:01:53 字数 792 浏览 0 评论 0原文

我有几个约100 GB NetCDF文件。在每个NetCDF文件中，都有一个变量a，我必须从中提取几个数据系列尺寸为（1440,721,6,8760）。我需要从每个NetCDF文件中提取〜20K dimension （1,1,1,8760）。由于提取一个切片（几分钟）非常慢，因此我阅读了如何优化过程。最有可能的块不是最佳设置的。因此，我的目标是将块大小更改为（1,1,1,8760），以进行更有效的I/O。

但是，我很难理解如何最好地重新构造这个NetCDF变量。首先，通过运行ncdump -k file.nc，我发现该类型为64位偏移。根据我的研究，我认为这是NetCDF3，不支持定义块大小。因此，我使用nccopy -k 3 source.nc dest.nc将其复制到NetCDF4格式。 ncdump -k file.nc现在返回netcdf -4。但是，现在我被困了。我不知道该如何进行。

如果任何人在Python，Matlab或使用NCCOPY中都有适当的解决方案，请分享。我现在正在尝试的是：

nccopy -k 3 -w -c latitude/1,longitude/1,level/1,time/8760 source.nc dest.nc

这是理论上正确的方法吗？不幸的是，在24小时后，它仍未在有效的服务器上完成，而RAM（250GB）和许多CPU（80）都没有完成。

原文

I have several ~100 GB NetCDF files.
Within each NetCDF file, there is a variable a, from which I have to extract several data series
The dimension is (1440,721,6,8760).
I need to extract ~20k slices of dimension (1,1,1,8760) from each NetCDF file.
Since it is extremely slow to extract one slice (several minutes), I read about how to optimize the process.
Most likely, the chunks are not set optimally.
Therefore, my goal is to change the chunk size to (1,1,1,8760) for a more efficient I/O.

However, I struggle to understand how I can best re-chunk this NetCDF variable.
First of all, by running ncdump -k file.nc, I found that the type is 64-bit offset.
Based on my research, I think this is NetCDF3 which does not support defining chunk sizes.
Therefore, I copied it to NetCDF4 format using nccopy -k 3 source.nc dest.nc.
ncdump -k file.nc now returns netCDF-4.
However, now I'm stuck. I do not know how to proceed.

If anybody has a proper solution in python, matlab, or using nccopy, please share it.
What I'm trying now is the following:

nccopy -k 3 -w -c latitude/1,longitude/1,level/1,time/8760 source.nc dest.nc

Is this the correct approach in theory?
Unfortunately, after 24 hours, it still did not finish on a potent server with more then enough RAM (250GB) and many CPUs (80).

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

情话已封尘 2025-02-20 15:01:53

您的命令似乎是正确的。重新结合需要时间。

ncks -4 --cnk_dmn latitude,1 --cnk_dmn longitude,1 --cnk_dmn level,1 --cnk_dmn time,8760 in.nc out.nc

看看这是否更快。

Your command appears to be correct. Re-chunking takes time.

ncks -4 --cnk_dmn latitude,1 --cnk_dmn longitude,1 --cnk_dmn level,1 --cnk_dmn time,8760 in.nc out.nc

to see if that is any faster.

回复收藏 0 原文

杀お生予夺 2025-02-20 15:01:53

这里的一个古老问题，但我在其他地方没有找到更多信息。我必须将大小为25 GB（解压缩）的尺寸恢复文件，并尝试了NCK和NCCOPY。我认为NCCOPY实际上工作了，但花了大约48小时。

我最终使用了Xarray和Python，结果速度更快。在不到5分钟的时间内将25 GB的重新包装（包括压缩）。记忆使用率约为50 GB。这是我使用过的方法：

import xarray as xr

ds = xr.open_dataset('field_access_opt.nc') # chunks (time,lat,lon): 1,500,1000

#re-chunk variable 'GHI'
ds.to_netcdf("point_access_opt.nc",
encoding={'lat': {'zlib': False, '_FillValue': None},
    'lon': {'zlib': False, '_FillValue': None},
    'time': {'zlib': False, '_FillValue': None, 'dtype': 'double'},
    'GHI': {'chunksizes': [len(ds['GHI'].time),1,1], 'zlib': True, 
    'complevel': 1}})

An old question here, but I didn't find much more information elsewhere. I have to rechunk files of size 25 GB (unzipped) and tried both ncks and nccopy. I think nccopy actually worked but took about 48 hours.

I ended up using xarray and python and it turned out to be much faster. The 25 GB were rechunked in less than 5 Minutes (incl. compression). Memory usage was about 50 GB. Here is the approach I have used:

import xarray as xr

ds = xr.open_dataset('field_access_opt.nc') # chunks (time,lat,lon): 1,500,1000

#re-chunk variable 'GHI'
ds.to_netcdf("point_access_opt.nc",
encoding={'lat': {'zlib': False, '_FillValue': None},
    'lon': {'zlib': False, '_FillValue': None},
    'time': {'zlib': False, '_FillValue': None, 'dtype': 'double'},
    'GHI': {'chunksizes': [len(ds['GHI'].time),1,1], 'zlib': True, 
    'complevel': 1}})

回复收藏 0 原文

~没有更多了~