大量 (>10Tb) 体数据有损压缩的最佳解决方案(天体物理模拟)
我从事天体物理学(星系形成)的大型模拟工作,并且遇到数据管理问题。事实上,这些模拟会产生大量的体积数据(3d 单元上的物理量(如 3d 像素))。我的问题很简单:根据您的说法,压缩此类数据的最佳解决方案是什么(有损压缩)。
我需要的是: - 可调节有损 3D 压缩 - 我不需要“即用型”解决方案,而是需要一个可以适应我的模拟代码的开源库/代码 - 能够处理大量数据
(解决方案可能来自图像/体积图像压缩库)
非常感谢。
编辑:这不是为了绘制/显示这些数据,而是为了真正减少这些数据的权重(因为如果我可以减少权重,我可以在磁盘上写入更多的模拟时间步长,从而更好地解决后期处理中的星系)
I work on large simulations of astrophysics (galaxy formation) and I have a problem of data management. In fact these simulations produce a very large amount of volumetic data (physical quantities on 3d cells (like 3d pixels)). My question is quite simple : what is, according to you, the best solution to compress such data (lossy compression).
What I need is :
- Adjustable lossy 3D compression
- I don't need a "ready-to-use" solution, but an open-source lib/code that I can adapt to my simulation code
- Ability to work on a large amount of data
(the solution may come from libraries of image/volumetric image compression)
Thank you very much.
EDIT : This is not for plotting/displaying these data, this is for really reducing the weight of these data (because if I can reduce the weigth, I can write more time step of the simulation on disk, and so better resolve the dynamic of galaxies in post-processing)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我不太确定这是否是您正在寻找的,因为这并不完全是压缩,并且不会减少磁盘上的数据量。但它可以用来简化表示和计算。
呈现大型数据集的解决方案是使用 LOD 实现。根据定义,它们是有损的,并且有些是可调整的。 此处实现了一些连续(且可调整)LOD 算法和< a href="http://gts.sourceforge.net/" rel="nofollow">此处
编辑:如果存储输出,您实际上可以使用 LOD 作为压缩方法的算法,但它肯定远非如此最有效的压缩策略
I am not quite sure if this is what you are looking for as this is not exactly compression and will not reduce the amount of data on your disk. But it can be used to simplify presentation and computation.
A solution for presentation of large datasets is using a LOD implementation. They are per definition lossy, and some are adjustable. There are some continuous (and adjustable) LOD algorithms implemented here and here
EDIT : you could actually use LOD as a compression method, if you store the output of the algorithm, but it would certainly be far from being the most efficient compression strategy
好吧,如果不知道数据的实际格式和构建规则,就很难回答。
很可能,对于如此原始的数量,格式是相当可压缩的(当我听到 3D 像素时,我希望如此)。
因此,您最好的猜测是将源数据“切割”成块,其大小适合您正在分析的格式,并独立压缩每个块。然后,您将在需要时解压缩每个块。
如果原始数据证明非常可压缩(例如大量零),则可以使用这种简单的方法获得一些非常好的结果。
Well, it's difficult to answer without knowing the actual format and build-rule of your data.
Chances are that, with such raw quantities, the format is quite compressible (when i hear 3D pixels, i expect that).
So your best guess would be to "cut" the source data into blocks which size is natural to the format you're analyzing, and compress each block independently. You will then decompress each block when you need it.
If the raw data proves very compressible (lots of zeros for example), you can get some very good results with this simple method.
我有一个类似的问题,我提出的“解决方案”是通过将空间维度之一转换为时间来使用现成的视频压缩。虽然不太漂亮,但在视频压缩方面已经做了很多工作(更不用说硬件支持),因此很难被击败。
I have a similar problem, and the "solution" I've come up with is to use off-the-shelf video compression by converting one of the spatial dimensions to time. Not pretty, but so much work has been done on video compression (not to mention hardware support) that it is hard to beat.
目前我还致力于 3D 点减少/压缩(但使用地形数据),我的解决方案是使用 OctoMap 框架并将其扩展为几何点分布模型。使用八叉树属性和来自分散数据近似的分层分布近似,可以在较低细节级别上实现高精度的 LOD 方法。
currently I also work on 3D point reduction/compression (but with terristic data) and my solution was to use the OctoMap framework and to extend it with geometrical point distribution models. Using the OcTree properties and the hierarchical distribution aproximation from Scattered Data Approximation it is possible to implement an LOD approach with high accuracies on lower detail levels.