如何合并超过300个NetCDF文件?

发布于 2025-01-12 13:59:39 字数 969 浏览 1 评论 0原文

我尝试使用 Xarray 将 300 多个 NetCDF 文件合并为一个。但它运行了三天多,最终的 NetCDF 文件大约有 5 GB。所有单个 NetCDF 文件大约有 1.5 GB。你能帮我看看如何将这些 NetCDF 文件合并为一个具有这种结构的文件吗?

<xarray.Dataset>
Dimensions:  (lat: 124, lon: 499, time: 79)
Coordinates:
  * lat      (lat) float64 50.96 50.96 50.97 50.97 ... 51.27 51.27 51.27 51.27
  * lon      (lon) float64 16.52 16.53 16.53 16.53 ... 17.77 17.77 17.77 17.77
  * time     (time) datetime64[ns] 2015-07-10 2015-07-22 ... 2017-08-10
Data variables:
    vel      (lat, lon) float64 ...
    coh      (lat, lon) float64 ...
    cum      (time, lat, lon) float64 ...

Process finished with exit code 0

我用whis代码尝试了它,但它仍然运行(超过3天)并且最终文件超过5GB。

import netCDF4
import numpy
import xarray
import dask    

dask.config.set({"array.slicing.split_large_chunks": False})

ds = xarray.open_mfdataset('../data/all-nc/*.nc', combine='nested', concat_dim="time")

ds.to_netcdf('../data/all-nc-files.nc')

非常感谢!

i tried combine more than 300 NetCDF files into one with Xarray. But it is running over three days and the final NetCDF file has about 5 GB. All single NetCDF files have about 1.5 GB. Can you help me how combine these NetCDF files into one with this structure?

<xarray.Dataset>
Dimensions:  (lat: 124, lon: 499, time: 79)
Coordinates:
  * lat      (lat) float64 50.96 50.96 50.97 50.97 ... 51.27 51.27 51.27 51.27
  * lon      (lon) float64 16.52 16.53 16.53 16.53 ... 17.77 17.77 17.77 17.77
  * time     (time) datetime64[ns] 2015-07-10 2015-07-22 ... 2017-08-10
Data variables:
    vel      (lat, lon) float64 ...
    coh      (lat, lon) float64 ...
    cum      (time, lat, lon) float64 ...

Process finished with exit code 0

I tried it with whis code, but it still running (more than 3 days) and final file have over 5 GB.

import netCDF4
import numpy
import xarray
import dask    

dask.config.set({"array.slicing.split_large_chunks": False})

ds = xarray.open_mfdataset('../data/all-nc/*.nc', combine='nested', concat_dim="time")

ds.to_netcdf('../data/all-nc-files.nc')

Thank you a lot!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

乖乖兔^ω^ 2025-01-19 13:59:39

您可能想使用 nctoolkit 尝试此操作,它使用 CDO 作为后端。这可能会更快:

ds = nc.open_data('../data/all-nc/*.nc')
ds.merge("time")
ds.to_nc('../data/all-nc-files.nc', zip = True)

注意:虽然我不确定为什么要将已经很大的文件合并到更大的文件中。如果这些是压缩的 netCDF 文件,那么您最终将得到一个超过 300 GB 的文件。我曾经处理过很多 netCDF 数据,但我从未见过有人生成这么大的文件。几乎可以肯定,简单地保留文件原样而不是合并它们会更有效。

You might want to try this with nctoolkit, which uses CDO as a backend. This will probably be faster:

ds = nc.open_data('../data/all-nc/*.nc')
ds.merge("time")
ds.to_nc('../data/all-nc-files.nc', zip = True)

Note: Though I am not sure why you are merging files, which are already very large, into an even larger file. If these are zipped netCDF files, then you will end up with a single file of over 300 GB. I have worked with a lot of netCDF data in my time, but I have never seen anyone produce a file that large. It is almost certainly more efficient to simply leave the files as they are instead of merging them.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文