我正在寻找一种适合大型网格数据集的良好存储格式。该应用程序是气象学,我们更喜欢该领域内常见的格式(以帮助与其他人交换数据)。我不需要处理特殊的数据结构,应该有一个Fortran API。我目前正在考虑 HDF5、GRIB2 和 NetCDF4。
这些格式在数据压缩方面如何比较?它们的主要局限性是什么?学习曲线有多陡?还有其他值得研究的存储格式吗?
我没有找到大量概述这些格式的差异和优缺点的材料(有一个相关的SO线程 和 比较 GRIB 和 NetCDF 的演示文稿)。
I am looking for a good storage format for large, gridded datasets. The application is meteorology, and we would prefer a format that is common within this field (to help exchange data with others). I don't need to deal with special data structures, and there should be a Fortran API. I am currently considering HDF5, GRIB2 and NetCDF4.
How do these formats compare in terms of data compression? What are their main limitations? How steep is the learning curve? Are there any other storage formats worth investigating?
I have not found a great deal of material outlining the differences and pros/cons of these formats (there is one relevant SO thread, and a presentation comparing GRIB and NetCDF).
发布评论
评论(2)
抱歉,我不是气象学专业,但在我看来,科学界正在向 HDF5 迈进,例如 NERSC 页面:
http://www.nersc.gov/users/training/online-tutorials/introduction-to-scientific-io/
我不得不对天体物理学数据采取相同的选择,就像我们历史上的那样使用 FITS,我发现开始使用 HDF5 非常容易,因为不仅有 Fortran 和 C,还有 C++ API,还有一个 python 包 (h5py)。
Sorry I'm not in meteorology, but it looks to me that the scientific community is moving towards HDF5, see for example the NERSC page:
http://www.nersc.gov/users/training/online-tutorials/introduction-to-scientific-i-o/
I had to take the same choice for astrophysics data, as we historically use FITS, and I found quite easy to start using HDF5, as there are APIs not only fortran and C but also C++, and also a python package (h5py).
我当然会考虑 HDF5,因为它似乎是科学界的趋势。
此外,HDF5 具有内置过滤器(包括压缩过滤器),或者您也可以编写自己的过滤器。
最后看一下 HDF5“分块”数据集,因为如果您有网格数据集,它们可能会非常有用。
http://www.hdfgroup.org/
I would certainly consider HDF5 as it seems to be the trend in the scientific community.
Also, HDF5 has builtin filters (including compression filters) or you can also write your own.
Finally take a look into HDF5 "chunked" datasets as they might prove really useful if you have gridded datasets.
http://www.hdfgroup.org/