如何合并超过300个NetCDF文件?
我尝试使用 Xarray 将 300 多个 NetCDF 文件合并为一个。但它运行了三天多,最终的 NetCDF 文件大约有 5 GB。所有单个 NetCDF 文件大约有 1.5 GB。你能帮我看看如何将这些 NetCDF 文件合并为一个具有这种结构的文件吗?
<xarray.Dataset>
Dimensions: (lat: 124, lon: 499, time: 79)
Coordinates:
* lat (lat) float64 50.96 50.96 50.97 50.97 ... 51.27 51.27 51.27 51.27
* lon (lon) float64 16.52 16.53 16.53 16.53 ... 17.77 17.77 17.77 17.77
* time (time) datetime64[ns] 2015-07-10 2015-07-22 ... 2017-08-10
Data variables:
vel (lat, lon) float64 ...
coh (lat, lon) float64 ...
cum (time, lat, lon) float64 ...
Process finished with exit code 0
我用whis代码尝试了它,但它仍然运行(超过3天)并且最终文件超过5GB。
import netCDF4
import numpy
import xarray
import dask
dask.config.set({"array.slicing.split_large_chunks": False})
ds = xarray.open_mfdataset('../data/all-nc/*.nc', combine='nested', concat_dim="time")
ds.to_netcdf('../data/all-nc-files.nc')
非常感谢!
i tried combine more than 300 NetCDF files into one with Xarray. But it is running over three days and the final NetCDF file has about 5 GB. All single NetCDF files have about 1.5 GB. Can you help me how combine these NetCDF files into one with this structure?
<xarray.Dataset>
Dimensions: (lat: 124, lon: 499, time: 79)
Coordinates:
* lat (lat) float64 50.96 50.96 50.97 50.97 ... 51.27 51.27 51.27 51.27
* lon (lon) float64 16.52 16.53 16.53 16.53 ... 17.77 17.77 17.77 17.77
* time (time) datetime64[ns] 2015-07-10 2015-07-22 ... 2017-08-10
Data variables:
vel (lat, lon) float64 ...
coh (lat, lon) float64 ...
cum (time, lat, lon) float64 ...
Process finished with exit code 0
I tried it with whis code, but it still running (more than 3 days) and final file have over 5 GB.
import netCDF4
import numpy
import xarray
import dask
dask.config.set({"array.slicing.split_large_chunks": False})
ds = xarray.open_mfdataset('../data/all-nc/*.nc', combine='nested', concat_dim="time")
ds.to_netcdf('../data/all-nc-files.nc')
Thank you a lot!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可能想使用 nctoolkit 尝试此操作,它使用 CDO 作为后端。这可能会更快:
注意:虽然我不确定为什么要将已经很大的文件合并到更大的文件中。如果这些是压缩的 netCDF 文件,那么您最终将得到一个超过 300 GB 的文件。我曾经处理过很多 netCDF 数据,但我从未见过有人生成这么大的文件。几乎可以肯定,简单地保留文件原样而不是合并它们会更有效。
You might want to try this with nctoolkit, which uses CDO as a backend. This will probably be faster:
Note: Though I am not sure why you are merging files, which are already very large, into an even larger file. If these are zipped netCDF files, then you will end up with a single file of over 300 GB. I have worked with a lot of netCDF data in my time, but I have never seen anyone produce a file that large. It is almost certainly more efficient to simply leave the files as they are instead of merging them.