exif信息是如何编码的?
您好,
我将使用 android 从一些图像中获取 exif 信息。我知道有一些标准的 java 库可以与该设备一起使用。我确信我最终会使用一个。
但与此同时,有人可以向我解释一下这些信息是如何编码在 JPG 中的吗?您通常会在哪里/如何从文档中获取信息。当我用文本编辑器打开他的文档时,它都是二进制的。
很好奇它是如何工作的以及我如何读取有问题的数据。
Greetings,
I'm going to get exif info from some images using android. I know there are some standard java lib's out there that I could use with the device. I'm sure I will end up using one.
But in the meantime can someone explain to me how this information is encoded inside a JPG? Where / how would you usually get the info from the document. When I opent he document up with a text editor its all binary.
Curious as to how it works and how I could potentially read the data in question.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我参加聚会有点晚了,但是已经编写了一个 用于处理 Exif 的 Java 库(以及其他 )
Exif
Exif 是基于TIFF(标记图像文件格式 构建的。因此,我们首先必须检查 TIFF:
可以将该结构视为一棵树,其叶子处具有原始值。 TIFF 对其结构进行了自我描述,但它并没有规定任何关于叶子上的值的实际含义。
实际上,您可以在 TIFF 中存储任何类型的数据,它不与图像耦合。
TIFF 文件有一个通用标头:
MM
或II
。这告诉您考虑所有未来字节的顺序——首先是 LSB 还是 MSB。0x002A
IFD 具有同样简单的结构:
标签有 12 个字节的简单表示:
。数据类型是预定义的。例如:1代表8位无符号整数,12代表64位浮点数。
因此,您可以继续并遵循数据文件。一些观察:
0x1234
的标签有 4 个整数:{1,2,3,4}
要将 TIFF 解码为 Exif,您需要应用定义每个 IFD 代表什么以及这些 IFD 中每个标签 ID 代表什么的字典。
JPEG
我的库的大多数用户都在处理 JPEG 文件。 JPEG 具有完全不同的结构,由一系列片段组成。每个段都有一个标识符和一个字节块。 Exif 位于 JPEG 文件的
APP1
(数值0xe1
)段中。完成后,您必须跳过几个前导字节 (Exif\0\0
),然后才能看到表示TIFF 格式的 Exif 数据的开始。将所有内容与示例结合在一起
这是我的库的一个示例图像<的二进制转储/a>:
按顺序:
JPEG 开始
FF D8
是 JPEG 的“幻数”。FF
标记 JPEG 片段的开始。E1
表示 JPEG 片段类型(这是 Exif 所在的APP1
)。18 B3
(十进制 6,323)给出了段的长度(包括大小字节),因此我们知道该 JPG 文件的所有 Exif 数据将位于接下来的 6,321 字节内。请注意,在 JPG 中,多字节值使用 Motorolla 排序进行编码,尽管嵌套 Exif 数据可能使用 Intel 排序。45 78 69 66 00 00
或 ASCII 中的Exif\0\0
是 Exif 前导码。APP1
并不是专门为 Exif 保留的,因此这是有区别的。TIFF/Exif 开头
4D 4D
或MM
表示我们在此 Exif 块中具有 Motorolla 字节顺序00 2A
是我们的标准 TIFF 标记,如上所述00 00 00 08
是第一个 IFD 相对于 TIFF 标头(本例中为MM
)的偏移量(8 个字节)。在本例中,这直接指向序列中的下一个字节,尽管并非必须如此。IFD Starts
00 08
打开我们的第一个 IFD,并告诉我们将有 8 个标签。Tag Starts
01 0F
是第一个 IFD 中第一个标签的 ID,在此例如,相机制造商00 02
是值的类型(2 表示它是 ASCII 字符串)00 00 00 16
是组件数量,这意味着我们将有一个 22 字节字符串00 00 01 B2
(十进制 434)是指向该字符串相对于 TIFF 标头 (MM
) 的位置的指针。您在此屏幕截图中看不到它,但它指向45 41 53 54 4D 41 4E 20 4B 4F 44 41 4B 20 43 4F 4D 50 41 4E 59 00
,即EASTMAN KODAK 中的 COMPANY
ASCII RAW
相机原始文件 (CR2/NEF/ORW...) 通常使用 TIFF,但它们大多使用与 Exif 不同的标签。这些文件中的第二对字节也与
00 2A
不同,指示应应用的 TIFF 字典的类型。I'm a bit late to the party, but having written a Java library for processing Exif (among other types of metadata) I thought I'd chime in.
Exif
Exif is built upon TIFF, the Tagged Image File Format. So we first have to examine TIFF:
Think of the structure as a tree with primitive values at the leaves. TIFF is self describing about its structure, but it doesn't dictate anything about what the values at the leaves actually mean.
Really you can store any kind of data in TIFF, it's not coupled to images.
The TIFF file has a generic header:
MM
orII
in ASCII. This tells you what order to consider all the future bytes in -- LSB or MSB first.0x002A
IFDs have equally simple structure:
Tags have a simple representation in 12 bytes:
The data types are predefined. For example: 1 represents 8-bit unsigned integers, and 12 represents 64-bit floating point numbers.
So with all that you can go ahead and follow the data file. Some observations:
0x1234
has 4 integers:{1,2,3,4}
To decode TIFF into Exif, you need to apply the dictionary that defines what each IFD represents, and what each tag ID within those IFDs represent.
JPEG
Most users of my library are processing JPEG files. JPEGs have a completely different structure, comprising a sequence of segments. Each segment has an identifier and a block of bytes. Exif is found in the
APP1
(numeric value0xe1
) segment of a JPEG file. Once you have that, you must skip past a few leading bytes (Exif\0\0
) before seeing theMM
orII
that denote the start of the TIFF formatted Exif data.Bringing it all together with an example
Here's a binary dump of one of my library's sample images:
In order:
JPEG Starts
FF D8
is the JPEG 'magic number'.FF
marks a JPEG segment start.E1
indicates the JPEG segment type (this isAPP1
, where Exif lives).18 B3
(6,323 decimal) gives the length of the segment (including the size bytes), so we know that all the Exif data for this JPG file will sit within the next 6,321 bytes. Note that in JPG, multi-byte values are encoded with Motorolla ordering, although nested Exif data may use Intel ordering.45 78 69 66 00 00
or in ASCIIExif\0\0
is the Exif preamble.APP1
is not exclusively reseved for Exif, so this discriminates.TIFF/Exif Starts
4D 4D
orMM
indicates we have Motorolla byte order in this Exif block00 2A
is our standard TIFF marker, as discussed above00 00 00 08
is the offset (8 bytes) to the first IFD, relative to the TIFF header (MM
in this case). This points directly to the next byte in the sequence in this case, though it doesn't have to.IFD Starts
00 08
opens our first IFD and tells we'll have 8 tags coming upTag Starts
01 0F
is the ID for the first tag in the first IFD, in this case the manufacturer of the camera00 02
is the type of the value (2 means it's an ASCII string)00 00 00 16
is the number of components, meaning we'll have a 22-byte string00 00 01 B2
(434 decimal) is a pointer to the location of that string, relative to the TIFF header (MM
). You can't see it in this screenshot, but it points to45 41 53 54 4D 41 4E 20 4B 4F 44 41 4B 20 43 4F 4D 50 41 4E 59 00
which isEASTMAN KODAK COMPANY
in ASCIIRAW
Camera raw files (CR2/NEF/ORW...) generally use TIFF, however they mostly use different tags to those for Exif. The second pair of bytes in these files will be different to
00 2A
as well, indicating the type of TIFF dictionary that ought to be applied.如果你搜索字符串“Exif”,你会发现 Exif 数据的开头——它非常复杂,我建议使用一个库——(例如我公司的 DotImage(如果您使用的是 .NET)。
不过,这里有一个高级描述:
Exif 本身位于 AppMarker 内部 - 前面的三个字节将是 E1 (AppMarker 1) 以及文件字节序中标记数据的大小。 Exif 之后的两个字节您将看到字节序标记(例如
49 49
表示II
表示 Intel,小字节序 - 这意味着 2 字节数字的低字节在前在文件中)。其余数据广泛使用偏移量,偏移量是从第一个字节序的位置开始的(上例中的 49),
从此偏移量开始的 8 个字节是一个 2 字节的数字,即 exif 标签的数量。如果采用
II
字节顺序,请反转字节以读取长度。那么就会有这个数量的12字节记录。每一条是:
在N个12字节记录之后,您将拥有上述N条记录中使用的每个偏移量所指向的数据。您需要查找 id 和类型以了解它们的含义以及它们的表示方式。
If you search for the string "Exif" you will find the start of the Exif data -- it's quite complicated, and I would recommend using a library -- (e.g. my company's DotImage if you were using .NET).
Here's a high level description though:
The Exif itself is inside of an AppMarker -- the three bytes before will be E1 (AppMarker 1) and the size of the marker's data in the endianness of the file. Two bytes after the Exif you will see the endianness marker (e.g.
49 49
meansII
which means Intel, little endian -- that means that 2 bytes numbers have the low byte first in the file).The rest of the data uses offsets extensively, the offset is from the location of the first endian byte (the 49 in the above case)
8 bytes from this offset is a 2-byte number which is the number of exif tags. If you are in
II
byte order, reverse the bytes to read the length.Then there will be this number of 12 byte records. Each one is:
After the N 12 byte records, you will have the data pointed to by each offset used in the above N records. You need to look up ids and types to see what they mean and how they are represented.
Wikipedia 有一些关于 EXIF 数据在文件中存储方式和存储位置的说明。当然,总是有 标准本身来阅读。
Wikipedia has a few pointers on how and where exactly EXIF data is stored in a file. Of course, there's always the standard itself to read up.
这是 Java 和 EXIF 的优秀库之一: http://www.drewnoakes.com/code /exif/
This is one of the good libraries for Java and EXIF: http://www.drewnoakes.com/code/exif/
解析 EXIF 数据非常繁琐,但是您可以找到许多库来解析它。我最喜欢的 Java 是
http://www.java2s.com/Open-Source/Java-Document/Web-Server/Jigsaw/org/w3c/tools/jpeg/Exif.java.htmhttp://jigsaw.w3.org/
It's pretty tedious to parse EXIF data but you can find many libraries to parse it. My favorite one for Java is,
http://www.java2s.com/Open-Source/Java-Document/Web-Server/Jigsaw/org/w3c/tools/jpeg/Exif.java.htmhttp://jigsaw.w3.org/