十亿像素图像的自描述文件格式?
在医学成像中,似乎有两种存储巨大的十亿像素图像的方法:
使用大量 JPEG 图像(打包到文件中或单独)并编写一些奇怪的索引格式来描述什么去了哪里。以其他格式添加一些元数据。
使用 TIFF 的平铺和多图像支持将图像干净地存储为单个文件,并提供下采样版本以提高缩放速度。然后滥用各种 TIFF 标签以非标准方式存储元数据。此外,存储具有重叠边界的图块,稍后必须单独翻译这些图块。
在这两种情况下,读者必须充分理解格式,以了解如何绘制事物和读取元数据。
有没有更好的方法来存储这些图像? TIFF(或 BigTIFF)仍然是正确的格式吗? XMP解决了元数据的问题吗?
主要问题是:
- 以允许快速随机访问的方式存储图像(平铺)
- 存储下采样图像以进行快速缩放(金字塔) 处理
- 平铺重叠或稀疏的情况(扫描仪通常通过在 2D 和幻灯片上移动相机来工作)仅捕获需要成像的地方)
- 存储重要的元数据,包括幻灯片标签和缩略图等相关图像
- 支持有损存储
人们使用哪种(希望是非专有)格式来存储大型航空照片或地图?这些图像具有相似的属性。
In medical imaging, there appears to be two ways of storing huge gigapixel images:
Use lots of JPEG images (either packed into files or individually) and cook up some bizarre index format to describe what goes where. Tack on some metadata in some other format.
Use TIFF's tile and multi-image support to cleanly store the images as a single file, and provide downsampled versions for zooming speed. Then abuse various TIFF tags to store metadata in non-standard ways. Also, store tiles with overlapping boundaries that must be individually translated later.
In both cases, the reader must understand the format well enough to understand how to draw things and read the metadata.
Is there a better way to store these images? Is TIFF (or BigTIFF) still the right format for this? Does XMP solve the problem of metadata?
The main issues are:
- Storing images in a way that allows for rapid random access (tiling)
- Storing downsampled images for rapid zooming (pyramid)
- Handling cases where tiles are overlapping or sparse (scanners often work by moving a camera over a slide in 2D and capturing only where there is something to image)
- Storing important metadata, including associated images like a slide's label and thumbnail
- Support for lossy storage
What kind of (hopefully non-proprietary) formats do people use to store large aerial photographs or maps? These images have similar properties.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
似乎从 TIFF 或 BigTIFF 开始,定义有用的标签子集 + XMP 元数据可能是可行的方法。 FITS 不好,因为它基本上是无损数据,并且没有非常合适的元数据机制。
TIFF 的问题在于它提供了太多的灵活性,但 TIFF 的子集应该是可以接受的。
解决方案很可能是 http://ome-xml.org/ 和 http://ome-xml.org/wiki/OmeTiff。
看起来 DICOM 现在已经支持:
ftp://medical.nema.org/MEDICAL/Dicom/Final/sup145_ft。 pdf
It seems like starting with TIFF or BigTIFF and defining a useful subset of tags + XMP metadata might be the way to go. FITS is no good since it is basically for lossless data and doesn't have a very appropriate metadata mechanism.
The problem with TIFF is that it just allows too much flexibility, but a subset of TIFF should be acceptable.
The solution may very well be http://ome-xml.org/ and http://ome-xml.org/wiki/OmeTiff.
It looks like DICOM now has support:
ftp://medical.nema.org/MEDICAL/Dicom/Final/sup145_ft.pdf
您可能需要FITS。
You probably want FITS.
我是一名病理学家(也是业余程序员),因此虚拟幻灯片和数字病理学是我的一大兴趣。您可能对 OpenSlide 项目感兴趣。他们对来自大型供应商(Aperio、BioImagene 等)的许多专有格式进行了表征。大多数似乎由金字塔形缩放(当然,在不同的微观目标下扫描)、包含多个平铺 tiff 或压缩(JPEG 或 JPEG2000)图像的大型 tiff 文件组成。
I'm a pathologist (and hobbyist programmer) so virtual slides and digital pathology are a huge interest of mine. You may be interested in the OpenSlide project. They have characterized a number of the proprietary formats from the large vendors (Aperio, BioImagene, etc). Most seem to consist of a pyramidal zoomed (scanned at different microscopic objectives, of course), large tiff files containing multiple tiled tiffs or compressed (JPEG or JPEG2000) images.
行业标准是DICOM Sup 145;尽管让供应商采用它的进程一直很缓慢,但发明另一种格式可能不会有帮助。
The industry standard is DICOM Sup 145; getting vendors to adopt it though has been sluggish, but inventing yet another format would probably not be helpful.
PNG 可能适合您。它可以处理大图像、元数据,并且 PNG 格式可以有一些交错,因此您可以很容易达到(降低到?) n/8 xn/8 下采样图像。
我不确定 PNG 是否可以进行快速随机访问。它被分块了,但这可能还不够。
您可以使用透明通道表示稀疏数据。
PNG might work for you. It can handle large images, metadata, and the PNG format can have some interlacing, so you can get up to (down to?) an n/8 x n/8 downsampled image pretty easily.
I'm not sure if PNG can do rapid random access. It is chunked, but that might not be enough.
You could represent sparse data with the transparency channel.
JPEG2000 可能值得一看,国家图书馆在这个领域做出了一些有趣的努力。
JPEG2000 might be worth a look, some interesting efforts from National libraries in this space.