关于 NetCDF 与 HDF5 存储科学数据的看法?
有人有足够的使用 NetCDF 和 HDF5 的经验来给出一些关于它们作为存储科学数据的方式的优点/缺点吗?
我已经使用过 HDF5,并且想通过 Java 进行读/写,但该接口本质上是 C 库的包装器,我发现它令人困惑,因此 NetCDF 似乎很有趣,但我对此几乎一无所知。
编辑:我的应用程序“仅”用于数据记录,因此我得到一个具有自描述格式的文件。 对我来说重要的功能是能够添加任意元数据,具有快速写入访问以附加到字节数组,以及具有单写入器/多读取器并发性(强烈推荐但不是必须具备的。NetCDF 文档说他们有 SWMR 但没有不说他们是否支持任何机制来确保两个编写者不能同时打开同一个文件而造成灾难性的结果)。 我喜欢 HDF5 的层次结构(特别是我喜欢有向无环图层次结构,它比“常规”类似文件系统的层次结构灵活得多),现在正在阅读 NetCDF 文档...如果它只允许每个文件一个数据集,那么它可能对我不起作用。 :(
更新 — 看起来像 NetCDF-Java从 netCDF-4 文件读取,但只从不支持分层组的 netCDF-3 文件写入。
更新 2009 年 7 月 14 日:我开始对 Java 中的 HDF5 感到非常不安。可用的库不是很好,它有一些与 Java 的抽象层(复合数据类型)有关的主要障碍,这是一种很棒的 C 文件格式,但看起来我只是失败了。
Anyone out there have enough experience w/ NetCDF and HDF5 to give some pluses / minuses about them as a way of storing scientific data?
I've used HDF5 and would like to read/write via Java but the interface is essentially a wrapper around the C libraries, which I have found confusing, so NetCDF seems intriguing but I know almost nothing about it.
edit: my application is "only" for datalogging, so that I get a file that has a self-describing format. Important features for me are being able to add arbitrary metadata, having fast write access for appending to byte arrays, and having single-writer / multiple-reader concurrency (strongly preferred but not a must-have. NetCDF docs say they have SWMR but don't say whether they support any mechanism for ensuring that two writers can't open the same file at once with disastrous results). I like the hierarchical aspect of HDF5 (in particular I love the directed-acyclic-graph hierarchy, much more flexible than a "regular" filesystem-like hierarchy), am reading the NetCDF docs now... if it only allows one dataset per file then it probably won't work for me. :(
update — looks like NetCDF-Java reads from netCDF-4 files but only writes from netCDF-3 files which don't support hierarchical groups. darn.
update 2009-Jul-14: I am starting to get really upset with HDF5 in Java. The library available isn't that great and it has some major stumbling blocks that have to do with Java's abstraction layers (compound data types). A great file format for C but looks like I just lose. >:(
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
我强烈建议您使用 HDF5 而不是 NetCDF。 NetCDF 是扁平的,如果您无法对内容进行分类,一段时间后它会变得非常脏。 当然分类也是一个有争议的问题,但至少你有这种灵活性。
当我写 Q5Cost 时,我们对 HDF5 与 NetCDF 进行了准确的评估,最终的结果是 HDF5 的。
I strongly suggest you HDF5 instead of NetCDF. NetCDF is flat, and it gets very dirty after a while if you are not able to classify stuff. Of course classification is also a matter of debate, but at least you have this flexibility.
We performed an accurate evaluation of HDF5 vs. NetCDF when I wrote Q5Cost, and the final result was for HDF5 hands down.
我不得不承认从长远来看使用 HDF5 要容易得多。 将简单的数据结构转换为 NetCDF 格式并不难,但在以后操作它们却有点痛苦。
HDF5 中的“H”代表“heirarchical”,它(无论如何对我来说)转化为一种非常简单的数据操作方法,只需移动节点并引用其他位置的节点即可。
能问一下这是什么项目吗? 我将它们用于许多 HPC 科学建模任务。 我可以假设你也在做同样的事情吗? 如果是这样,我看到的趋势是人们转向 HDF5,但在您的特定领域可能会有所不同。
不管怎样,你最终还是会走,祝你好运!
I'll have to admit using HDF5 is very much easier in the long run. It's not hard to get simple data structures into NetCDF format, but manipulating them down the road is kind of a pain.
The "H" in HDF5 stands for "heirarchical", which translated (for me anyway) into a REALLY easy way to manipulate data, by just moving nodes around and referencing nodes from other places.
Can I ask what kind of project this is? I use these both for a lot of HPC scientific modeling tasks. Can I assume you're doing the same? If so, the trend I'm seeing is people moving to HDF5, but that might be different in your particular domain.
However you end up going, best of luck!
NetCDF 从版本 4.0 (2008) 开始可以读取和写入大多数 HDF5 文件,并通过增强的数据模型提供对 HDF5 分层功能的访问。
HDF5 功能极其丰富,并且具有一些出色的性能特性。
NetCDF 具有更简单的 API 和更广泛的工具库。 有许多工具可以处理 netCDF 数据。
NetCDF, starting with version 4.0 (2008) can read and write most HDF5 files, and provides access to the hierarchical features of HDF5 via the enhanced data model.
HDF5 is extremely feature-rich, and has some great performance features.
NetCDF has a simpler API, and a much wider tool base. There are many tools that handle netCDF data.
我知道这是一篇较旧的帖子,原始发帖人已经表明他们已经继续前进,但对于最终到达这里的任何人...netCDF-Java 库(截至 4.3.13)通过以下方式提供了 netCDF-4 写入支持netCDF C 库。 它仍处于测试阶段,但它确实有效,并且反馈当然受到赞赏!
请参阅 netCDF-Java 参考 文档了解更多详细信息。
I know this is an older post, and the original poster has indicated they've moved on, but for anyone that ends up here...the netCDF-Java library (as of 4.3.13) has netCDF-4 write support via the netCDF C library. It's still in beta, but it does work and feedback is certainly appreciated!
Please see the netCDF-Java reference docs for more details.
1) Netcdf-4 C 库是 HDF-5 C 库之上的一层。 该 API 被认为比 HDF5 库更简单,但最终您拥有几乎相同的功能。 Netcdf 不支持图形,但 HDF5 支持。 事实上,我认为 HDF 并不能阻止图表中的循环。
2) HDF 小组在 HDF-5 C 库之上有一个 Java API。
3)Unidata有纯Java的Netcdf-Java库,但只能读取HDF-5。
1) Netcdf-4 C library is a layer on top of HDF-5 C library. The API is considered simpler than the HDF5 library, but in the end you have pretty much the same functionality. Netcdf does not support graphs, but HDF5 does. In fact, HDF does not prevent cycles in your graph i think.
2) the HDF group has a Java API on top of HDF-5 C library.
3) Unidata has Netcdf-Java library which is pure Java, but can only read HDF-5.
尝试在每个中编写一些小示例应用程序,并比较体验。 如果您的代码未来可扩展性到并行执行(通过 MPI 等)对您很重要,我知道 HDF 有一个并行实现,人们正在不断努力改进它。 我不确定 NetCDF。
最新编辑:对于 NetCDF,现在有来自 Argonne 的 并行 NetCDF。 它运行得很好,并且开发团队非常积极地进一步改进它。
Try writing some small sample application in each, and compare the experience. If future scalability of your code to parallel execution (via MPI or the like) is important to you, I know that HDF has a parallel implementation, which people are constantly working to improve. I'm not sure about NetCDF.
Late edit: For NetCDF, there is now Parallel NetCDF from Argonne. It works quite well, and the development team is quite active in improving it further.
NetCDF 将 HDF5 转换为自己的数据模型,看起来和工作都很棒... 直到您发现 NetCDF 不支持无符号值! 另请参阅我的问题,了解如何使用 NetCDF 检测现有 HDF5 文件中的无符号值。
更新:实际上,虽然 NetCDF-3 不支持有符号值,但 NetCDF-4 支持有符号值,尽管 Java 中用于确定有符号性的 NetCDF API 是 有点复杂。
NetCDF, which translates HDF5 into its own data model, looks and works great... until you find out that NetCDF doesn't support unsigned values! See also my question on how to detect unsigned values in existing HDF5 files using NetCDF.
Update: Actually, it turns out that although NetCDF-3 doesn't support signed values, NetCDF-4 supports signed values, even though the NetCDF API in Java for determining signedness is a little convoluted.