动态生成的 PDF 文件适用于除 Adobe Reader 之外的大多数阅读器

发布于 08-31 17:04 字数 1518 浏览 18 评论 0原文

我正在尝试根据用户输入动态生成 PDF，我基本上打印用户输入并将其覆盖在我未创建的现有 PDF 上。

它有效，但有一个主要例外。 Adobe Reader 在 Windows 或 Linux 上都无法正确读取。我手机上的 QuickOffice 也无法读取它。所以我想我应该追踪我创建文件的路径 -

1 - 背景原始PDF
使用 Adobe Distiller 使用 LZW 编码制作的 PDF 1.2。这不是我做的。

2 - PDF 背景
使用 Ghostscript 制作的 PDF 1.4。我在上面使用了 pdf2ps 然后 ps2pdf 来剥离 LZW，以便 reportlab 和 pyPDF 库能够识别它。请注意，此文件在 Adobe Reader 中看起来“模糊”，就像扫描不良一样，但在其他阅读器中看起来很好。

3 - 用户输入文本的 PDF 格式，以便与背景结合使用
PDF 1.3 根据用户输入使用 Reportlab 制作。在我尝试过的每个阅读器中都可以正常打开并且看起来不错。

4 - 完成的 PDF
PDF 1.3 由 PyPDF 的 mergePage() 函数在 2 和 3 上生成。

无法在以下位置打开：
Windows 版 Adobe Reader
适用于 Linux 的 Adobe Reader
Android 版 QuickOffice

可在以下位置完美打开：
Google 文档的网络 PDF 查看器
Linux 的证据
Linux 下的 GhostScript 查看器 Windows 版福昕阅读器
Mac 预览版

有我应该了解的已知问题吗？我不知道“flate”到底是什么，但从互联网上我了解到它是用于 PDF 压缩的 LZW 的某种开源替代品？这会导致我的问题吗？如果是这样，我可以使用任何库来修复代码中的原因吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

活泼老夫2024-09-07 17:04:16

第一句话：

你的第二步有很多很多缺点。如果将 PDF 转换回 PostScript，然后再转换回 PDF，则会降低质量。这个过程被称为“重新编译 PDF”，PDF 专业人士通常不赞成这种做法。（原因是：生成的文件可能看起来“模糊”，就像扫描不良一样；文件可能丢失了嵌入的字体；文件可能已替换原始字体；文件肯定丢失了透明度；图像已更改分辨率；颜色已更改...... .)

有时你除了“重新油炸”别无选择……但在这里你做。

如果您使用 Ghostscript，您可以将 PDF 文件直接进行 PDF 到 PDF 的转换，并且不会发生内部隐藏的 PostScript 转换。（这是 Ghostscript 的一个很少为人所知的功能，因此这个答案正常应该得到很多赞成票；-P）。

由于您确实想要摆脱内部 LZW 压缩，因此在 Ghostscript 中执行此操作的方法如下：

下载一个用 PostScript 语言编写的小实用程序，可从 Ghostscript 源代码存储库获取：pdfinflt.ps
运行以下命令行：
gswin32c.exe -- [c:/path/to/]pdfinflt.ps input.pdf output.pdf

^{更新：此链接到pdfinflt.ps 的最新版本。此后它已通过以下提交消息被删除：}

删除 pdfinflt.ps 和 pdfwrite.ps
-----------------------------------
pdfwrite 仅（据我所知）由 pdfinflt.ps 使用，它说：

% 尚未准备好进入黄金时段，但任何人都可以使用
% 来修复它。
%
% 主要问题是：
%
% 1.有时写入的PDF文件会损坏。当他们在
% 损坏，GS 出现外部参照问题。
%
% 这个问题实际上是由于 lib/pdfwrite.ps 造成的，因为即使
% 如果没有进行转换，文件可能是坏的。

因为它不起作用，我们可以使用 MuPDF（它确实有效）
同样的任务，我选择删除这两个文件。

生成的 PDF 将解压缩其所有内部数据流，而不会因 PDF ==> 而损失质量。 PS==> PDF重新炒。

第二句话：

我认为你应该使用不同的工具来完成第四步，即pdftk***。这样做的好处是可以让您完全无需执行步骤 1. 和 2.。

pdfk（PDF ToolKit，在此处下载）是一个命令行实用程序，可用在 Linux、Unix (pdftk) 和 Windows (pdftk.exe) 上，它可以对 PDF 执行很多操作，包括将两个 PDF 的页面相互叠加。这是我建议您使用的。 pdftk 可以覆盖步骤“3”中的 PDF。一次性转换为原始 PDF（或反之亦然），无需先对每个 PDF 进行压缩或解 LZW。

以下是供您测试的命令：

pdftk.exe ^ original.pdf ^ background pdf-from-userinput-step3.pdf ^ output merged.pdf

pdftk.exe ^
pdf-from-userinput-step3.pdf ^
background original.pdf ^
output merged.pdf
pdftk.exe ^
original.pdf ^
stamp pdf-from-userinput-step3.pdf ^
output merged.pdf

pdftk.exe ^ pdf-from-userinput-step3.pdf ^ stamp original.pdf ^ output merged.pdf

您可能想知道 stamp 和 background 命令之间的区别。这些命令的作用正如其名称所示：将 PDF 页面排序到前景或背景层。如果两个 PDF 都有透明背景（而不是纯白色不透明），则结果在许多情况下看起来是相同的。

First remark:

Your 2nd step has many, many drawbacks. If you convert PDF back to PostScript and then again back to PDF, you are going to loose quality. This process is called "re-frying PDFs", and is generally being frowned upon on the part of PDF professionals. (The reasons are: resulting files may look "fuzzy", like bad scans; files may have lost their embedded fonts; files may have replaced original fonts; files certainly have lost their transparencies; images have changed resolutions; colors have changed....)

Sometimes you have no other choice than "re-frying"... but here you DO.

If you use Ghostscript, you can do a direct PDF-to-PDF conversion of PDF files, and there will be no internal, hidden PostScript conversion happening. (This is a very rarely known feature of Ghostscript, and therefor this answer normall would deserve lots of upvotes ;-P ).

Since you do want to get rid of internal LZW compression, here is how to do it in Ghostscript:

Download a little utility program, written in PostScript language, available from the Ghostscript source code repository: pdfinflt.ps
Run the following commandline:
gswin32c.exe -- [c:/path/to/]pdfinflt.ps input.pdf output.pdf

^{Update: This links to the last version of pdfinflt.ps. It has since been removed with this commit message:}

Remove pdfinflt.ps and pdfwrite.ps
-----------------------------------
pdfwrite is only (as far as I can see) used by pdfinflt.ps which says:

% It is not yet ready for prime time, but it is available for anyone wants
% to fix it.
%
% The main problem is:
%
% 1. Sometimes the PDF files that are written are broken. When they are
%    broken, GS gets an xref problem.
%
%    This problem is actually due to lib/pdfwrite.ps since even
%    when no conversion is done, the file is may be bad.

Since it doesn't work, and we can use MuPDF (which does work) for the
same task, I've chosen to delete both these files.

The resulting PDF will have decompressed all its internal data streams, without loosing quality through your PDF ==> PS ==> PDF re-frying.

Second remark:

I think you should do your 4th step with a different tool, namely pdftk***. This has the advantage of saving you completely from going through steps 1. and 2. altogether.

pdfk (PDF ToolKit, download here) is a commandline utility, available on Linux, Unix (pdftk) and Windows (pdftk.exe), which can do a lot of things on PDFs, including overlaying the pages of two PDFs over each other. This is what I'd recommend you to use. pdftk can overlay the PDF from your step "3." to your original PDF (or vice versa) in one go without first needing to de-flate or de-LZW each one.

Here are commands for you to test:

pdftk.exe ^ original.pdf ^ background pdf-from-userinput-step3.pdf ^ output merged.pdf pdftk.exe ^ pdf-from-userinput-step3.pdf ^ background original.pdf ^ output merged.pdf pdftk.exe ^ original.pdf ^ stamp pdf-from-userinput-step3.pdf ^ output merged.pdf pdftk.exe ^ pdf-from-userinput-step3.pdf ^ stamp original.pdf ^ output merged.pdf

You'll probably wonder about the difference between the stamp and background commands. The commands do what their name suggests: order the PDF page into the foreground or the background layer. Should both PDFs have transparent backgrounds (instead of solid white opaque), the result will in many cases be looking the same.

回复收藏 0 原文

~没有更多了~