PDF 字体映射错误
渲染由 PDFCreator 0.9.x 生成的 PDF 文件时。我注意到它在字符映射中包含错误。现在,PDF 文件中的错误已经没什么好奇怪的了,Acrobat 在渲染错误的 PDF 文件方面确实创造了奇迹,因此许多 PDF 生成器创建的 PDF 并不完全符合 PDF 标准。
我尝试创建一个小示例文件: http://test.continuit.nl/temp/Document。 pdf
单页使用 Tj 命令呈现单个字形(大写 A)(请参阅流 5 0 obj)。所选字体 (7 0 obj) 包含嵌入单个字形的字体。到目前为止,一切都很好。该字符由字符 #1 引用。给定字体的编码,它包含差异部分:[ 1 /A ]。因此 char 1 ->字符/A。现在,在嵌入的子集字体中,有一个 cmap 与字符 65(例如大写 A)处的字形不匹配,字体的 cmap 部分确实按照 PDF 文件中的顺序定义了字符。编码->差异数组。
看起来字符映射/编码完成了两次。似乎只有 PDFCreator 0.9.x 中的文件受到影响。
我的问题是:这是正确的(还是我犯了一个错误,PDF 是否正确)以及您将如何检测这种情况以解决渲染问题。
注意:我确实需要能够渲染这些 PDF。
解决方案
在 ISO32000 文件中,有一条注释表示符号 TrueType 字体(字体描述符中的标志位 3 处于打开状态)不允许编码你应该忽略它,始终使用简单的一对一编码。总而言之,如果它是符号字体,我完全忽略 Encoding 对象,这解决了问题。
While rendering a PDF file generated by PDFCreator 0.9.x. I noticed it contains an error in the character mapping. Now, an error in a PDF file is nothing to be wondered about, Acrobat does wonders in rendering faulty PDF files hence a lot of PDF generators create PDFs that do not adhere fully to the PDF standard.
I trief to create a small example file: http://test.continuit.nl/temp/Document.pdf
The single page renders a single glyph (a capital A) using a Tj command (See stream 5 0 obj). The font selected (7 0 obj) contains a font with a single glyph embedded. So far so good. The char is referenced by char #1. Given the Encoding of the font it contains a Differences part: [ 1 /A ]. Thus char 1 -> character /A. Now in the embedded subset font there is a cmap that matches no glyph at character 65 (eg capital A) the cmap section of the font does define the character in exactly the order in the PDF file Font -> Encoding -> Differences array.
It looks like the character mapping / encoding is done twice. Only Files from PDFCreator 0.9.x seem to be affected.
My question is: Is this correct (or did I make a mistake and is the PDF correct) and what would you do to detect this situation in order to solve the rendering problem.
Note: I do need to be able to render these PDFs..
Solution
In the ISO32000 file there is a remark that symbolic TrueType fonts (flag bit 3 is on in the font descriptor) the encoding is not allowed and you should IGNORE it, using a simple 1on1 encoding always. SO all in all, if it is a symbolic font, I ignore the Encoding object altogether and this solves the problem.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
第一点是文件在 Acrobat 中打开并正确呈现,因此几乎可以肯定该文件是正确的。事实上,它可以在广泛的 PDF 用户中正确打开和呈现,因此实际上它是正确的。
有问题的字体是 TrueType 字体,所以实际上有两种“编码”。首先是 PDF/PostScript 编码。这将字符代码映射到字形名称。在您的情况下,它将字符代码 1 映射到字形名称 /A。
在 PostScript 字体中,我们将在 CharStrings 字典中查找名称 /A,这将为我们提供字符描述,然后我们将执行该描述。不过,TrueType 字体的情况有所不同。
您可以在 1.7 PDF 参考手册的第 430 页上找到这一点,其中指出:
“TrueType 字体程序的内置编码通过称为“cmap”(不是与第 5.6.4 节“CMap”中描述的 CMap 相混淆。”
我相信在您的情况下,您只需要直接在 CMAP 子表中使用字符代码(0x01)即可。这将为您提供 GID 36。
The first point is that the file opens and renders correctly in Acrobat, so its almost certain that the file is correct. In fact it opens and renders correctly in a wide range of PDF consumers, so in fact it is correct.
The font in question is a TrueType font, so actually yes, there are two kinds of 'encoding'. First there is PDF/PostScript Encoding. This maps a character code into a glyph name. In your case it maps character code 1 to glyph name /A.
In a PostScript font we would then look up the name /A in the CharStrings dictionary, and that would give us the character description, which we would then execute. Things are different with a TrueType font though.
You can find this on page 430 of the 1.7 PDF Reference Manual, where it states that:
"A TrueType font program’s built-in encoding maps directly from character codes to glyph descriptions by means of an internal data structure called a “cmap” (not to be confused with the CMap described in Section 5.6.4, “CMaps”)."
I believe in your case that you simply need to use the character code (0x01) directly in the CMAP sub table. This will give you a GID of 36.