我怎样才能“手动”编辑 pdf 中的注释而不损坏文件?
我需要在几千个现有 pdf 中插入一个超链接。我正在使用 zend_pdf ,它显然无法设置不可见的边框。我发现使链接边框不可见的唯一方法(在本网站的其他地方找到它,这里,准确地说)是搜索pdf的每个链接“元素”并添加/Border注释,如下所示:
echo str_replace('/Annot /Subtype /Link', '/Annot /Subtype /Link /Border[0 0 0]', $pdf->render());
由于我需要处理驻留在文件系统上的文件,因此我使用 sed 命令进行搜索和查找。替换操作。
现在,乍一看这是可行的,因为 Acrobat 8、osx 10.6 的查看器和 Ubuntu 的文档查看器可以正确显示文档。但是,pdftk (1.41) 和 pdfinfo (0.12.1) 等工具报告结构已损坏。这很烦人,因为这意味着无法使用 pdftk 对 pdf 进行进一步操作,因为该工具拒绝处理该文件,因为其中存在错误。 我使用二进制编辑器查看了这些文件,发现如果我在“/Link”后添加两个以上字节,文件就会损坏。这让我很困惑,因为根据 pdf 规范(我使用的是 1.4),除了流之外没有校验和,这应该意味着人们可以添加任意多的字节,只要他不在 a 中这样做即可。流和插入的字节是有效的 pdf 语法。 我在这里缺少什么?
I need to insert an hyperlink into a few thousand existing pdfs. I'm working with zend_pdf which apparently is not able to set an invisible border. The only way I found to make the link borders invisible (found it somewhere else on this site, here, to be precise) is to search for each link "element" of the pdf and add a /Border annotation, like this:
echo str_replace('/Annot /Subtype /Link', '/Annot /Subtype /Link /Border[0 0 0]', $pdf->render());
Since I need to work on files that reside on my filesystem, I'm using the sed command for the search & replace operation.
Now, at first sight this works, as the documents are displayed correctly by Acrobat 8, osx 10.6's Viewer and Ubuntu's document viewer. However, tools such as pdftk (1.41) and pdfinfo (0.12.1) report the structure is corrupted. This is annoying since it means that no further manipulation of the pdf using pdftk will be possible, since the tool refuses to work on the file as there are errors in it.
I looked into the files using a binary editor and I found out that if I add more than two bytes after "/Link", the file gets corrupted. This confuses me a lot, since based on the pdf specifications (I'm using 1.4) there is no checksum except for streams, which should mean that one can add as much bytes as he wants, as long as he's not doing that inside a stream and the inserted bytes are valid pdf syntax.
What am I missing here?
Here is an example:
the original pdf
the processed pdf
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在文件中添加额外的“/Border”元素实际上会损坏 pdf 的外部参照表。外部参照表按位置引用所有对象,以从文件开头开始的字节为单位进行测量。插入附加元素当然会将后续内容的位置(偏移量)移动几个字节。
要在编辑后修复外部参照表,我可以使用 pdf labs (http://www.pdftk.com) 中的 pdftk 来修复外部参照表:
事实上,我无法找到针对php的全面的Pdf解决方案,我不得不使用几种不同类型的工具(zend_pdf、pdftk、sed)来实现我的工作流程。
Adding the additional "/Border" element in the file actually corrupts the pdf's xref table. The xref table references all the objects by their position, measured in bytes from the beginning of the file. Inserting the additional element of course shifts the position (offset) of the subsequent contents by a few bytes.
To fix the xref table after the edit, I can use pdftk from pdf labs (http://www.pdftk.com) to fix the xref table:
As a matter of fact, I was not able to find a comprehensive Pdf solution for php, and I had to use several different kinds of tools (zend_pdf, pdftk, sed) to implement my workflow.