如何合并超过300个NetCDF文件？

发布于 2025-01-12 13:59:39 字数 969 浏览 1 评论 0原文

我尝试使用 Xarray 将 300 多个 NetCDF 文件合并为一个。但它运行了三天多，最终的 NetCDF 文件大约有 5 GB。所有单个 NetCDF 文件大约有 1.5 GB。你能帮我看看如何将这些 NetCDF 文件合并为一个具有这种结构的文件吗？

<xarray.Dataset>
Dimensions:  (lat: 124, lon: 499, time: 79)
Coordinates:
  * lat      (lat) float64 50.96 50.96 50.97 50.97 ... 51.27 51.27 51.27 51.27
  * lon      (lon) float64 16.52 16.53 16.53 16.53 ... 17.77 17.77 17.77 17.77
  * time     (time) datetime64[ns] 2015-07-10 2015-07-22 ... 2017-08-10
Data variables:
    vel      (lat, lon) float64 ...
    coh      (lat, lon) float64 ...
    cum      (time, lat, lon) float64 ...

Process finished with exit code 0

我用whis代码尝试了它，但它仍然运行（超过3天）并且最终文件超过5GB。

import netCDF4
import numpy
import xarray
import dask    

dask.config.set({"array.slicing.split_large_chunks": False})

ds = xarray.open_mfdataset('../data/all-nc/*.nc', combine='nested', concat_dim="time")

ds.to_netcdf('../data/all-nc-files.nc')

非常感谢！

原文

i tried combine more than 300 NetCDF files into one with Xarray. But it is running over three days and the final NetCDF file has about 5 GB. All single NetCDF files have about 1.5 GB. Can you help me how combine these NetCDF files into one with this structure?

<xarray.Dataset>
Dimensions:  (lat: 124, lon: 499, time: 79)
Coordinates:
  * lat      (lat) float64 50.96 50.96 50.97 50.97 ... 51.27 51.27 51.27 51.27
  * lon      (lon) float64 16.52 16.53 16.53 16.53 ... 17.77 17.77 17.77 17.77
  * time     (time) datetime64[ns] 2015-07-10 2015-07-22 ... 2017-08-10
Data variables:
    vel      (lat, lon) float64 ...
    coh      (lat, lon) float64 ...
    cum      (time, lat, lon) float64 ...

Process finished with exit code 0

I tried it with whis code, but it still running (more than 3 days) and final file have over 5 GB.

import netCDF4
import numpy
import xarray
import dask    

dask.config.set({"array.slicing.split_large_chunks": False})

ds = xarray.open_mfdataset('../data/all-nc/*.nc', combine='nested', concat_dim="time")

ds.to_netcdf('../data/all-nc-files.nc')

Thank you a lot!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

乖乖兔^ω^ 2025-01-19 13:59:39

您可能想使用 nctoolkit 尝试此操作，它使用 CDO 作为后端。这可能会更快：

ds = nc.open_data('../data/all-nc/*.nc')
ds.merge("time")
ds.to_nc('../data/all-nc-files.nc', zip = True)

注意：虽然我不确定为什么要将已经很大的文件合并到更大的文件中。如果这些是压缩的 netCDF 文件，那么您最终将得到一个超过 300 GB 的文件。我曾经处理过很多 netCDF 数据，但我从未见过有人生成这么大的文件。几乎可以肯定，简单地保留文件原样而不是合并它们会更有效。

You might want to try this with nctoolkit, which uses CDO as a backend. This will probably be faster:

ds = nc.open_data('../data/all-nc/*.nc')
ds.merge("time")
ds.to_nc('../data/all-nc-files.nc', zip = True)

Note: Though I am not sure why you are merging files, which are already very large, into an even larger file. If these are zipped netCDF files, then you will end up with a single file of over 300 GB. I have worked with a lot of netCDF data in my time, but I have never seen anyone produce a file that large. It is almost certainly more efficient to simply leave the files as they are instead of merging them.

回复收藏 0 原文

~没有更多了~