使用 Ghostscript 将 pdf 页面的整个媒体框渲染为 png 文件
我正在尝试使用 Ghostscript v9.02 将 Pdfs 页面渲染为 png 文件。为此,我使用以下命令行:
gswin32c.exe -sDEVICE=png16m -o outputFile%d.png mypdf.pdf
当 pdf 裁剪框与媒体框,但如果裁剪框小于媒体框,则仅显示媒体框,并且 pdf 页面的边框会丢失。
我知道通常 pdf 查看器只显示裁剪框,但我需要能够在 png 文件中看到整个媒体页面。
Ghostscript 文档说默认情况下会渲染文档的媒体框,但这不起作用就我而言。 任何人都知道如何使用 Ghostscript 实现渲染整个媒体框?
是否对于 png 文件设备,仅渲染裁剪框?我可能忘记了特定的命令吗?
例如,此 pdf 包含裁剪框之外的一些注册标记,这些注册标记不存在于输出 png 文件中。有关此 pdf 的更多信息:
- 媒体盒:
- 宽度:667
- 身高:908 分
- 裁剪框:
- 宽度:640
- 身高:851 宽度
I'm trying to render Pdfs pages into png files using Ghostscript v9.02. For that purpose I'm using the following command line:
gswin32c.exe -sDEVICE=png16m -o outputFile%d.png mypdf.pdf
This is working fine when the pdf crop box is the same as the media box, but if the crop box is smaller than the media box, only the media box is displayed and the border of the pdf page is lost.
I know usually pdf viewers only display the crop box but I need to be able to see the whole media page in my png file.
Ghostscript documentation says that per default the media box of a document is rendered, but this does not work in my case.
As anyone an idea how I could achieve rendering the whole media box using ghostscript?
Could it be that for png file device, only the crop box is rendered? Am I maybe forgetting a specific command?
For example, this pdf contains some registration marks outside of the crop box, which are not present in the output png file. Some more information about this pdf:
- media box:
- width: 667
- height: 908 pts
- crop box:
- width: 640
- height: 851
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
好吧,现在 revers 已经将他的问题重新表述为他正在寻找“通用代码”,让我再试一次。
“通用代码”的问题在于,PDF 中可能出现许多“CropBox”语句的“合法”形式表示。以下所有选项都是可能且正确的,并为页面的 CropBox 设置相同的值:
/CropBox[10 20 500 700]
/CropBox[ 10 20 500 700 ]
/CropBox[10 20 500 700 ]
<代码>/CropBox [10 20 500 700]
/CropBox [ 10 20 500 700 ]
<代码>/CropBox [ 10.00 20.0000 500.0 700]
对于
ArtBox
、TrimBox
、BleedBox
、CropBox
和MediaBox 也是如此
。因此,如果您想编辑 PDF 源代码中的 *Box 表示,则需要对其进行“规范化”。第一步:“规范化”PDF 源代码
操作方法如下:
qpdf
适用于您的操作系统平台。qpdf --qdf input.pdf output.pdf
output.pdf
现在将具有一种标准化结构(类似于给出的最后一个示例)上面),即使使用像sed
这样的流编辑器,编辑也会更容易。第二步:删除所有多余的 *Box 语句
接下来,您需要知道唯一必需的 *Box 是
MediaBox
。这一个必须存在,其他是可选的(以某种优先顺序的方式)。如果缺少其他项,则它们默认与MediaBox
具有相同的值。因此,为了实现您的目标,我们只需删除与它们相关的所有代码即可。我们将在sed
的帮助下完成此操作。该工具通常安装在所有 Linux 系统上 - 在 Windows 上,请从 gnuwin32.sf.net 下载并安装它。 (如果您决定使用 .zip 文件而不是安装 .exe,请不要忘记安装指定的“依赖项”)。
现在运行此命令:
sed.exe -i.bak -e "/CropBox/,/]/s#.# #g" output.pdf
这是此命令应该执行的操作:
.bak
后缀的备份文件(以防出现问题)。/CropBox/
表示 sed 要处理的第一个地址行。/]/
表示 sed 处理的最后一个地址行。s
告诉 sed 对从第一个寻址行到最后一个寻址行的所有行进行替换。#.# #g
告诉 sed 进行哪种替换:将地址空间中的每个任意字符 ('.
') 替换为空格 (''),全局('
g
')。我们用空格替换所有字符(而不是“无”,即删除它们),否则我们会收到有关“PDF 文件损坏”的抱怨,因为对象引用计数和流长度都会发生变化。
第三步:运行 Ghostscript 命令
您已经很清楚了:
上面的所有三个步骤都可以轻松编写脚本,我将留给您自己使用。
OK, now that revers has re-stated his problem into that he is looking for "generic code", let me try again.
The problem with a "generic code" is that there are many "legal" formal representations of "CropBox" statements which could appear in a PDF. All of the following are possible and correct and set the same values for the page's CropBox:
/CropBox[10 20 500 700]
/CropBox[ 10 20 500 700 ]
/CropBox[10 20 500 700 ]
/CropBox [10 20 500 700]
/CropBox [ 10 20 500 700 ]
/CropBox [ 10.00 20.0000 500.0 700 ]
The same is true for
ArtBox
,TrimBox
,BleedBox
,CropBox
andMediaBox
. Therefor you need to "normalize" the *Box representation inside the PDF source code if you want to edit it.First Step: "Normalize" the PDF source code
Here is how you do that:
qpdf
for your OS platform.qpdf --qdf input.pdf output.pdf
The
output.pdf
now will have a kind of normalized structure (similar to the last example given above), and it will be easier to edit, even with a stream editor likesed
.Second Step: Remove all superfluous *Box statements
Next, you need to know that the only essential *Box is
MediaBox
. This one MUST be present, the others are optional (in a certain prioritized way). If the others are missing, they default to the same values asMediaBox
. Therefor, in order to achieve your goal, we can simply delete all code that is related to them. We'll do it with the help ofsed
.That tool is normally installed on all Linux systems -- on Windows download and install it from gnuwin32.sf.net. (Don't forget to install the named "dependencies" should you decide to use the .zip file instead of the Setup .exe).
Now run this command:
sed.exe -i.bak -e "/CropBox/,/]/s#.# #g" output.pdf
Here is what this command is supposed to do:
-i.bak
tells sed to edit the original file inline, but to also create a backup file with a.bak
suffix (in case something goes wrong)./CropBox/
states the first address line to be processed by sed./]/
states the last address line to be processed by sed.s
tells sed to do substitutions for all lines from first to last addressed line.#.# #g
tells sed which kind of substitution to do: replace each arbitrary character ('.
') in the address space by blanks (''), globally ('
g
').We substitute all characters by blanks (instead of by 'nothing', i.e. deleting them) because otherwise we'd get complaints about "PDF file corruption", since the object reference counting and the stream lengths would have changed.
Third step: run your Ghostscript command
You know that already well enough:
All the three steps from above can easily be scripted, which I'll leave to you for your own pleasure.
首先,我们要消除一个误解。你写道:
这是不正确的。如果 CropBox 小于 MediaBox,则仅应显示 CropBox(而不是 MediaBox)。这正是它的设计原理。这就是 CropBox 概念背后的整个想法......
目前我无法想到一个可以自动适用于每个 PDF 以及可能存在的所有可能值的解决方案(除非您想使用付费软件)。
要手动处理链接到的 PDF,请执行以下操作:
/CropBox
关键字的所有位置。/CropBox [12.3456 78.9012 345.67 890.123456]
。/CropBox [0.00000 0.00000 667.00 908.000000]
。 (您可以使用空格代替我的.0000..
部分,但如果我这样做,SO 编辑器将吃掉它们,您将看不到我最初键入的内容...)First, let's get rid of a misunderstanding. You wrote:
That's not correct. If the CropBox is smaller than the MediaBox, then only the CropBox should be displayed (not the MediaBox). And that is exactly how it was designed to work. This is the whole idea behind the CropBox concept...
At the moment I cannot think of a solution that works automatically for each PDF and all possibly values that can be there (unless you want to use payware).
To manually process the PDF you linked to:
/CropBox
keyword./CropBox [12.3456 78.9012 345.67 890.123456]
./CropBox [0.00000 0.00000 667.00 908.000000]
. (You can use spaces instead of my.0000..
parts, but if I do, the SO editor will eat them and you'll not see what I originally typed...)