关于十六进制形式的 EXIF 的问题
我试图了解 jpeg 文件(十六进制)的 EXIF 标头部分以及如何理解它,以便我可以提取数据,特别是 GPS 信息。无论好坏,我正在使用 VB.Net 2008(抱歉,这是我现在能掌握的)。我已经将 jpg 的前 64K 提取到字节数组中,并且对数据的排列方式有一个模糊的概念。使用 EXIF 规范文档 2.2 和 2.3 版,我看到有一些标签,它们应该与文件中的实际字节序列相对应。我看到有一个“GPS IFD”,其值为 8825(十六进制)。我在文件中搜索十六进制字符串 8825(我理解为两个字节 88 和 25),然后我相信 8825 后面有一个字节序列。我怀疑这些后续字节表示文件中的位置,通过通过偏移方式,可以定位 GPS 数据。例如,我有以下十六进制字节,从 88 25 开始: 88 25 00 04 00 00 00 01 00 00 05 9A 00 00 07 14. 我要查找的字符串是否超过 16 个字节?我的印象是,在这串数据中,它应该告诉我在文件中哪里可以找到实际的 GPS 数据。
查看 http://search.cpan .org/~bettelli/Image-MetaData-JPEG-0.153/lib/Image/MetaData/JPEG/Structures.pod#Exif_and_DCT,页面中间,它谈到“每个 IFD 块都是一个结构化的记录序列,用 Exif 行话来说,称为互操作性数组。第 0 个 IFD 的开头由“IFD0_Pointer”值给出。 IFD 的结构如下:
那么,什么是 IFD0_Pointer?与偏移量有关系吗?我认为偏移量距起始点有很多字节。如果这是真的,那么起点在哪里?
感谢您的任何回复。
戴尔
I am trying to understand the EXIF header portion of a jpeg file (in hex) and how to understand it so I can extract data, specifically GPS information. For better or worse, I am using VB.Net 2008 (sorry, it is what I can grasp right now). I have extracted the first 64K of a jpg to a byte array and have a vague idea of how the data is arranged. Using the EXIF specification documents, version 2.2 and 2.3, I see that there are tags, that are supposed to correspond to actual byte sequences in the file. I see that there is a “GPS IFD” that has a value of 8825 (in hex). I search for the hex string 8825 in the file (which I understand to be two bytes 88 and 25) and then I believe that there is a sequence of bytes following the 8825. I suspect that those subsequent bytes denote where in the file, by way of an offset, the GPS data would be located. For example, I have the following hex bytes, starting with 88 25: 88 25 00 04 00 00 00 01 00 00 05 9A 00 00 07 14. Is the string that I am looking for longer than 16 bytes? I get the impression that in this string of data, it should be telling me where to find the actual GPS data in the file.
Looking at http://search.cpan.org/~bettelli/Image-MetaData-JPEG-0.153/lib/Image/MetaData/JPEG/Structures.pod#Exif_and_DCT, halfway down the page, it talks about “Each IFD block is a structured sequence of records, called, in the Exif jargon, Interoperability arrays. The beginning of the 0th IFD is given by the 'IFD0_Pointer' value. The structure of an IFD is the following:”
So, what is an IFD0_Pointer? Does it have to do with an offset? I presume an offset is so many bytes from a beginning point. If that is true, where is that beginning point?
Thanks for any responses.
Dale
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我建议您阅读 Exif 规范 (PDF);它很清楚并且很容易遵循。作为简短的入门知识,这里是我写的文章的摘要:
A JPEG /Exif 文件以图像标记 (SOI) 的开头开始。 SOI 由两个魔术字节
0xFF 0xD8
组成,将文件标识为 JPEG 文件。 SOI 之后是许多应用程序标记部分(APP0、APP1、APP2、APP3...),通常包括元数据。应用程序标记部分
每个 APPn 部分都以一个标记开始。对于 APP0 部分,标记为
0xFF 0xE0
,对于 APP1 部分,标记为0xFF 0xE1
,依此类推。标记字节后面跟着两个字节,表示该部分的大小(不包括标记,包括大小字节)。长度字段后面是可变大小的应用程序数据。 APPn 部分是连续的,因此您可以跳过整个部分(通过使用部分大小),直到到达您感兴趣的部分。APPn 部分的内容各不相同,以下仅适用于 Exif APP1 部分。Exif APP1 部分
Exif 元数据存储在一个 APP1 部分中(可能有多个 APP1 部分)。 Exif APP1 部分中的应用程序数据由 Exif 标记
0x45 0x78 0x69 0x66 0x00 0x00
("Exif\0\0"
)、TIFF 标头和一些图像文件目录 (IFD) 部分。TIFF 标头
TIFF 标头包含有关 IFD 部分的字节顺序的信息以及指向第 0 个 IFD 的指针。如果字节顺序是小端字节序,则前两个字节为
0x49 0x49
(对于 Intel,II
)或0x4D 0x4D
(MM
对于摩托罗拉)用于大端字节序。接下来的两个字节是魔术字节0x00 0x2A
(42
;))。接下来的四个重要字节将告诉您从 TIFF 标头开始处到第 0 个 IFD 的偏移量。重要提示:JPEG 文件本身(您到目前为止所阅读的内容)将始终采用大端格式。但是,IFD小节的字节顺序可能不同,需要转换(您可以从上面的TIFF头中知道字节顺序)。
图像文件目录
一旦到达这里,您就可以将指针指向第 0 个 IFD 部分,并准备好读取实际的元数据。其余的 IFD 在不同的地方被引用。 Exif IFD 和 GPS IFD 的偏移量在第 0 个 IFD 字段中给出。第 0 个 IFD 字段后给出第一个 IFD 的偏移量。 Exif IFD 中给出了互操作性 IFD 的偏移量。
IFD 只是元数据字段的连续记录。字段计数在 IFD 的前两个字节中给出。字段计数后面是 12 字节字段。在这些字段之后,有一个从 TIFF 标头的开头到第一个 IFD 的开头的 4 字节偏移量。该值仅对第 0 个 IFD 有意义。接下来是 IFD 数据部分。
IFD 字段
字段是 IFD 部分的 12 字节子部分。每个字段的前两个字节给出 Exif 标准中定义的标签 ID。接下来的两个字节给出了字段数据的类型。
1
代表byte
,2
代表ascii
,3
代表>short
(uint16
)、4
表示long
(uint32
) 等。检查 Exif完整列表的规范。接下来的四个字节可能有点令人困惑。对于字节数组(
ascii
和未定义类型
),给出了数组的字节长度。例如,对于 Ascii 字符串:"Exif"
,计数将为 5(包括空终止符)。对于其他类型,这是场分量的数量(例如 4 个空头,3 个有理数)。计数之后,我们得到 4 字节字段值。然而,如果字段数据的长度超过4个字节,它将被存储在IFD数据部分。在这种情况下,该值将是从 TIFF 标头的开头到字段数据的开头的偏移量。例如,对于
long
(uint32
,4 字节),这将是字段值。对于有理数(2 x uint32
,8 字节),这将是 8 字节字段数据的偏移量。这基本上就是元数据在 JPEG/Exif 文件中的排列方式。有一些注意事项需要记住(记住根据需要转换字节顺序,偏移量是从 TIFF 标头的开头开始,跳转到数据部分以读取长字段,...),但格式非常容易阅读。以下是 JPEG/Exif 文件的颜色编码十六进制视图。蓝色块代表 SOI,橙色块代表 TIFF 标头,绿色块代表 IFD 大小和偏移字节,浅紫色块代表 IFD 字段,深紫色块代表字段数据。
I suggest you to read The Exif Specification (PDF); it is clear and quite easy to follow. For a short primer, here is the summary of an article I wrote:
A JPEG/Exif file starts with the start of the image marker (SOI). The SOI consists of two magic bytes
0xFF 0xD8
, identifying the file as a JPEG file. Following the SOI, there are a number of Application Marker sections (APP0, APP1, APP2, APP3, ...) typically including metadata.Application Marker Sections
Each APPn section starts with a marker. For the APP0 section, the marker is
0xFF 0xE0
, for the APP1 section0xFF 0xE1
, and so on. Marker bytes are followed by two bytes for the size of the section (excluding the marker, including the size bytes). The length field is followed by variable size application data. APPn sections are sequential, so that you can skip entire sections (by using the section size) until you reach the one you are interested in. Contents of APPn sections vary, the following is for the Exif APP1 section only.The Exif APP1 Section
Exif metadata is stored in an APP1 section (there may be more than one APP1 section). The application data in an Exif APP1 section consists of the Exif marker
0x45 0x78 0x69 0x66 0x00 0x00
("Exif\0\0"
), the TIFF header and a number of Image File Directory (IFD) sections.The TIFF Header
The TIFF header contains information about the byte-order of IFD sections and a pointer to the 0th IFD. The first two bytes are
0x49 0x49
(II
for Intel) if the byte-order is little-endian or0x4D 0x4D
(MM
for Motorola) for big-endian. The following two bytes are magic bytes0x00 0x2A
(42
;)). And the following four important bytes will tell you the offset to the 0th IFD from the start of the TIFF header.Important: The JPEG file itself (what you have been reading until now) will always be in big-endian format. However, the byte-order of IFD subsections may be different, and need to be converted (you know the byte-order from the TIFF header above).
Image File Directories
Once you get this far, you have your pointer to the 0th IFD section and you are ready to read actual metadata. The remaining IFDs are referenced in different places. The offset to the Exif IFD and the GPS IFD are given in the 0th IFD fields. The offset to the first IFD is given after the 0th IFD fields. The offset to the Interoperability IFD is given in the Exif IFD.
IFDs are simply sequential records of metadata fields. The field count is given in the first two bytes of the IFD. Following the field count are 12-byte fields. Following the fields, there is a 4 byte offset from the start of the TIFF header to the start of the first IFD. This value is meaningful for only the 0th IFD. Following this, there is the IFD data section.
IFD Fields
Fields are 12-byte subsections of IFD sections. The first two-bytes of each field give the tag ID as defined in the Exif standard. The next two bytes give the type of the field data. You will have
1
forbyte
,2
forascii
,3
forshort
(uint16
),4
forlong
(uint32
), etc. Check the Exif Specification for the complete list.The following four bytes may be a little confusing. For byte arrays (
ascii
andundefined types
), the byte length of the array is given. For example, for the Ascii string:"Exif"
, the count will be 5 including the null terminator. For other types, this is the number of field components (eg. 4 shorts, 3 rationals).Following the count, we have the 4-byte field value. However, if the length of the field data exceeds 4 bytes, it will be stored in the IFD Data section instead. In this case, this value will be the offset from the start of the TIFF header to the start of the field data. For example, for a
long
(uint32
, 4 bytes), this will be the field value. For arational
(2 x uint32
, 8 bytes), this will be an offset to the 8-byte field data.This is basically how metadata is arranged in a JPEG/Exif file. There are a few caveats to keep in mind (remember to convert the byte-order as needed, offsets are from the start of TIFF header, jump to data sections to read long fields, ...) but the format is quite easy to read. Following is the color-coded HEX view of a JPEG/Exif file. The blue block represents the SOI, orange is the TIFF header, green is the IFD size and offset bytes, light purple blocks are IFD fields and dark purple blocks are field data.
这是我编写的用于修改 exif 标头的 php 脚本。
Here is a php script I wrote to modify exif headers.