如何打开 PDF 原始文件?

发布于 2024-11-18 09:25:48 字数 57 浏览 2 评论 0原文

一段时间以来,我一直想查看 PDF 的内部结构,例如它的原始源代码,以便我可以查看它。有什么办法吗?

I've been wanting to see the insides of a PDF for a while, like, the raw source code of it so I can look at it. Any way of doing that?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

月亮坠入山谷 2024-11-25 09:25:48

除非您也了解其内部结构,否则查看 PDF 的原始代码对您没有多大帮助。您应该为自己获取一份 官方 PDF 参考(下载 PDF),您应该阅读一些介绍性文章,例如 [消失] 这个 首先。

即使经过这样的准备,您在查看原始代码时也不会发现有多大用处。因为 PDF 通常会包含“过滤”的部分(即:压缩)。

如何查看“原始”二进制部分背后的真实 PDF 源

Jay Birkenbilt 的 qpdf 是一个非常有用的命令行工具(可用于 Linux、Mac OSX、Windows,并作为源代码,在开源 Artistic 许可证下),它可以解压大多数过滤内容并重新组织内部结构,从而为您提供更多帮助更多见解进入其中(所有对象都按数字顺序排列等)。实现此目的的命令行是:

 qpdf  --qdf  original.pdf  unpacked.pdf

另一个用于查看 PDF 的有用且免费的工具(GPL 许可,但仅限 Linux)当然是 PDF编辑。这个甚至带有 GUI(如果您愿意的话),同时仍然允许您访问内部结构和“原始”PDF 代码。

Looking at the raw code of PDFs will not serve you much unless you also have an idea about its internal structure. You should get yourself a copy of the official PDF reference (download PDF), and you should have read some introductionary article such as this [gone] or this to begin with.

Even after such a preparation, you'll not discover much useful when staring at the raw code. Because PDFs usually will contain parts which are "filtered" (that means: compressed).

How to look at the real PDF source behind the 'raw' binary parts

Jay Birkenbilt's qpdf is a very useful commandline tool (available for Linux, Mac OSX, Windows, and as source code, under the open source Artistic License), which can unpack most filtered content and re-organize the internal structure in a way that gives you much more insight into it (all objects are numerically ordered, etc.). The commandline to achieve this is:

 qpdf  --qdf  original.pdf  unpacked.pdf

Another useful and free tool (GPL licensed, but Linux-only AFAIK) to look into PDFs is of course PDFEdit. This one even comes with a GUI (if you prefer that), while still allowing you access to the internal structure and "raw" PDF code.

鲜血染红嫁衣 2024-11-25 09:25:48

如果目的只是查看文件,那么任何简单的文本编辑器都可以,例如记事本。 PDF 只是一种基于文本的格式,包括嵌入的内容字节流。原始 PDF 看起来像这样:

>>
/Border [0 0 0]
/Rect [121.02 332.48 363.24 343.64]
/StructParent 1321
/Subtype /Link
/Type /Annot
>>
endobj
64579 0 obj
<<
/Filter /FlateDecode
/Length 5771
>>
stream
Ũn0x/�+�}�ǹ����\֛ bYO�5[��X��W��L��(�������V�A3�C���������u큋_�a��ךm2N�6�    ��A��8
�d���NQ⺢GI��G�[��)�̉Y��R�y{R����&�&�;��g�k1���ҋeTC�(W��`���*��(;�AEc<=  mnZ+��|T��v
�.��зe�aޞ��V4�b���L����k�Oj.ֿ�y�����kc|I��  ��C�0��Hf�7d�/�z���m��o��A��B��IJ�%�. 
!�%f�б���&�ޒ�4Ύ7�l�3���3`�
endstream
endobj
64580 0 obj
<<
/Border [0 0 0]
/Dest <E4AE7DD2769553EF1668>
/Rect [219 648.5 256.8 659.66]
/StructParent 1323
/Subtype /Link
/Type /Annot
>>

你看到的是基本的 COS 对象,如名称、字典、流等。所有对象均在 PDF 32000 标准中进行描述,请参阅7.3 对象部分。

If the purpose is just to look into the file, then any simple text editor will do, ex, Notepad. PDF is just a text based format, including embedded content byte streams. Raw PDF looks like this:

>>
/Border [0 0 0]
/Rect [121.02 332.48 363.24 343.64]
/StructParent 1321
/Subtype /Link
/Type /Annot
>>
endobj
64579 0 obj
<<
/Filter /FlateDecode
/Length 5771
>>
stream
Ũn0x/�+�}�ǹ����\֛ bYO�5[��X��W��L��(�������V�A3�C���������u큋_�a��ךm2N�6�    ��A��8
�d���NQ⺢GI��G�[��)�̉Y��R�y{R����&�&�;��g�k1���ҋeTC�(W��`���*��(;�AEc<=  mnZ+��|T��v
�.��зe�aޞ��V4�b���L����k�Oj.ֿ�y�����kc|I��  ��C�0��Hf�7d�/�z���m��o��A��B��IJ�%�. 
!�%f�б���&�ޒ�4Ύ7�l�3���3`�
endstream
endobj
64580 0 obj
<<
/Border [0 0 0]
/Dest <E4AE7DD2769553EF1668>
/Rect [219 648.5 256.8 659.66]
/StructParent 1323
/Subtype /Link
/Type /Annot
>>

What you see are basic COS objects like name, dictionary, stream and so on. All objects are described in PDF 32000 standard, see section 7.3 Objects.

另类 2024-11-25 09:25:48

Use a Hex editor. Of course, unless you know the PDF specification (PDF, 8.6 MB), you won't recognize much.

夏日落 2024-11-25 09:25:48

除了 qpdf 工具转换为 postscript 可能会有帮助。
PDF 是 PS 的子集。通常它很容易弄清楚,例如图表的标签在哪里。您可以使用 pdf2ps 或调用 Ghostscript

gs -sDEVICE=pswrite some.pdf -sOutputFile=some.ps -dNOPAUSE -c quit

当您使用 pdflatex 生成 PDF 时,您可以使用选项禁用压缩。这使得 PDF 更具可读性。

In addition to the qpdf tool conversion into postscript might be helpful.
PDF is a subset of PS. Usually its quite easy to figure out, e.g. where the labels of a graph are. You can either use pdf2ps or invoke ghostscript

gs -sDEVICE=pswrite some.pdf -sOutputFile=some.ps -dNOPAUSE -c quit

When you generate your PDFs using pdflatex you can disable compression with an option. This makes the PDF more readable.

勿忘初心 2024-11-25 09:25:48

对其他答案的一些最新观察。

Adobe 不断推进其 2008 标准的开源副本,因此目前位于此处 https://opensource.adobe.com/dc-acrobat-sdk-docs/standards/pdfstandards/pdf/PDF32000_2008.pdf
网络档案馆目前有一个副本 https://ia601003.us.archive.org/5/items/pdf320002008/PDF32000_2008.pdf

它们应该是相同的 22,491,828 字节,因此请注意,两者都不包含任何勘误表。

pdf 可以是完美的纯 mime "text/pdf" 吗?从控制台键盘或命令行(太慢)或批处理文件生成的注释。我不会让你厌倦整个文件,但它开始像这样:

REM Start with File "Magic" Signatures for a PDF
echo %%PDF-1.0>!Fname!
echo %%âãÏÓ>>!Fname!

echo %%01) Prepare file references>>!Fname!
for %%Z in (!Fname!) do set "FZ1=%%~zZ"
echo 1 0 obj>>!Fname!
echo ^<^</Names^<^</Dests 2 0 R^>^>/Outlines 3 0 R>>/PageLayout/OneColumn/PageMode/UseOutlines>>!Fname!

REM ToDo add files
REM /Lang (ga-IE)/MarkInfo^<^</Marked true^>^>/Names ^<^<^/EmbeddedFiles [(file.ext) 3 0 R]^>^>>>!Fname!

echo /Pages 4 0 R/Type/Catalog/ViewerPreferences^<^</DisplayDocTitle true^>^>^>^>>>!Fname!
echo endobj>>!Fname!

echo %%02) Prepare Named Destinations>>!Fname!

因此,带注释的 RAW PDF(请注意,我已在 cmd 文件 准备 XMP 数据部分,因此不相同)可能如下所示:-

%PDF-1.3 
%âãÏÓ
%01) Prepare file references
1 0 obj
<</Lang(ga-IE)/Names<</Dests 3 0 R>>/Outlines 4 0 R/PageLayout/OneColumn/PageMode/UseOutlines
/PageLabels<</Nums[0<</S/A>>]>>/Pages 5 0 R/Type/Catalog/ViewerPreferences<</DisplayDocTitle true>>>>
endobj
%02) Reserved for big meta data
2 0 obj
<< >>
endobj
%03) Prepare Named Destinations
3 0 obj
<</Names [(Page1) [6 0 R /XYZ 0 792 null] (QRCode) [6 0 R /XYZ 25.0 317.0 1]]>>
endobj
%04) Prepare Outline / Bookmarks
...
...

其他人提出的许多建议,用于将二进制应用程序/PDF 解压缩为文本/PDF,有些可能是混合格式,因此仍已二进制化申请文本。

为该任务设计的 3 个最常见的工具是 qpdf(已经提到,但使用混合 QDF)PDFtk(解压缩)和 Mutool(不同的 CLI 选项),这是我最常使用的,因为在 GL GUI 中可以轻松更改输出设置。可以在 MS 记事本中修改输出,同时预览结果。

因此,任何文本编辑脚本都可以编写或编辑 PDF,甚至可以包含图形。一些应用程序可以将 RAW“二进制”PDF 转换为 RAW“文本”PDF。但是,切勿尝试在暂时使用文本 base64 RePrEx 时编辑 PDF(可能,但完全不切实际)

在此处输入图像描述

Some more recent observations on the other answers.

Adobe keep moving about their Open Sourced copy of the 2008 standard so currently that is here https://opensource.adobe.com/dc-acrobat-sdk-docs/standards/pdfstandards/pdf/PDF32000_2008.pdf
The Web Archive have currently a copy here https://ia601003.us.archive.org/5/items/pdf320002008/PDF32000_2008.pdf

They should be identical 22,491,828 bytes so beware neither includes any errata.

A pdf CAN be plain mime "text/pdf" as perfectly ? annotated generated from a console keyboard or command line (too slow) or a batch file. I won't bore you with the whole file but it starts like this:

REM Start with File "Magic" Signatures for a PDF
echo %%PDF-1.0>!Fname!
echo %%âãÏÓ>>!Fname!

echo %%01) Prepare file references>>!Fname!
for %%Z in (!Fname!) do set "FZ1=%%~zZ"
echo 1 0 obj>>!Fname!
echo ^<^</Names^<^</Dests 2 0 R^>^>/Outlines 3 0 R>>/PageLayout/OneColumn/PageMode/UseOutlines>>!Fname!

REM ToDo add files
REM /Lang (ga-IE)/MarkInfo^<^</Marked true^>^>/Names ^<^<^/EmbeddedFiles [(file.ext) 3 0 R]^>^>>>!Fname!

echo /Pages 4 0 R/Type/Catalog/ViewerPreferences^<^</DisplayDocTitle true^>^>^>^>>>!Fname!
echo endobj>>!Fname!

echo %%02) Prepare Named Destinations>>!Fname!

Thus the annotated RAW PDF (note I had edited the order in the cmd file in preparation for an XMP data section, so not identical) could look like :-

%PDF-1.3 
%âãÏÓ
%01) Prepare file references
1 0 obj
<</Lang(ga-IE)/Names<</Dests 3 0 R>>/Outlines 4 0 R/PageLayout/OneColumn/PageMode/UseOutlines
/PageLabels<</Nums[0<</S/A>>]>>/Pages 5 0 R/Type/Catalog/ViewerPreferences<</DisplayDocTitle true>>>>
endobj
%02) Reserved for big meta data
2 0 obj
<< >>
endobj
%03) Prepare Named Destinations
3 0 obj
<</Names [(Page1) [6 0 R /XYZ 0 792 null] (QRCode) [6 0 R /XYZ 25.0 317.0 1]]>>
endobj
%04) Prepare Outline / Bookmarks
...
...

Many suggestions by others for decompress binary application/PDF into text/PDF and some may be a hybrid thus still have binarized application text.

The 3 most common designed for the task are qpdf (already mentioned, but uses a hybrid QDF) PDFtk (uncompress) and Mutool (different CLI options), that's the one I play with most, as it's easy in GL GUI to change the output settings. The output can be modified in MS Notepad, whilst previewing result.

So any text editing script can write or edit a PDF even with graphics. And several applications can convert RAW "binary" PDF into RAW "textual" PDF. However never attempt to edit PDF whilst temporarily in its textual base64 RePrEx (possible, but totally impractical)

enter image description here

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文