在 PHP 中使用 ImageMagick 将 PDF 转换为 JPG 时出现奇数字母间距
我正在尝试使用 PHP exec()
调用将 PDF 转换为 JPG,如下所示:
convert page.pdf -resize 716x716 page.jpg
由于某种原因,尽管 PDF 在 Acrobat 中看起来很好,但由于某种原因,JPG 出现了卡顿文本和 Mac 预览。以下是原始 PDF:
http://whit.info/dev/conversion/page.pdf
这是 janktastic 输出:
http://whit.info/dev/conversion/page。 jpg
服务器是一个带有 PHP 5 和 ImageMagick 6.2.8 的 LAMP 堆栈。
你能帮助这个陷入困境的极客吗?
预先感谢,
惠特
I am trying to convert a PDF to a JPG with a PHP exec()
call, which looks like this:
convert page.pdf -resize 716x716 page.jpg
For some reason, the JPG comes out with janky text, despite the PDF looking just fine in Acrobat and Mac Preview. Here is the original PDF:
http://whit.info/dev/conversion/page.pdf
and here is the janktastic output:
http://whit.info/dev/conversion/page.jpg
The server is a LAMP stack with PHP 5 and ImageMagick 6.2.8.
Can you help this stumped Geek?
Thanks in advance,
Whit
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
ImageMagick 将调用 Ghostscript 将此 PDF 转换为图像。如果您在 pdf 上运行
gs
,您会得到相同的间距错误的输出。我怀疑 Ghostscript 不能很好地处理 PDF 的嵌入 TrueType 字体。如果您可以将输出更改为嵌入 Type 1 字体或使用“核心”PostScript 字体,您将获得更好的结果。
ImageMagick is just going to call out to Ghostscript to convert this PDF to an image. If you run
gs
on the pdf, you get the same badly-spaced output.I suspect Ghostscript isn't handling the PDF's embedded TrueType fonts very well. If you could change your output to either embed Type 1 fonts or use a "core" PostScript font, you'd get better results.
我怀疑它是编码/宽度问题。两者都有点偏离,尽管我无法指出原因。
以下是一些嫌疑点:
首先
文本流是在 UTF-16 LE 中定义的。 charNULLcharNULL,使用普通字符串绘制命令语法:
(some text) Tj
有一种方法可以将任何旧字符值转义为 () 字符串。您还可以这样定义十六进制字符串:
<203245> Tj
这两种方法都没有使用,只是使用了有问题的内联空值。如果 GS 尝试使用指向 char 的指针但没有与其关联的长度,则可能会导致 GS 出现问题。
第二
宽度数组是愚蠢的。您可以这样定义组中的宽度:
[ 32 [450 525 500] 37 [600 250] 40 [0] ]
这定义了
32:450
33:525
34:500
37:600
38:250
40: 0
这些字体定义了它们在各个数组中的连续宽度。不是非法的,但绝对是浪费/愚蠢的,如果 GS 被编码为 EXPECT 数组之间的间隙,则可能会引发错误。
数组中还有一些非常可疑的值。 32 到 126 是连续定义的,但随后它开始跳跃:
...126 [600] 8364 [500] 8216 [222] 402 [500] 8222 [389]。 8230 [1000] 8224 [444].. 。然后又从 160 回到连续的 255。
很奇怪。
第三
我什至不太确定,但 CIDToGIDMap 流包含大量空值。
底线
这些字体很可疑。我从来没有听说过“Bellflower Books”或“UFPDF 0.1”
这个版本号让我感到畏缩。它也应该让你感到畏缩。
在 Google 上搜索“UFPDF”,我发现了作者的注释:
UFPDF 是一个位于 FPDF 之上的 PHP 库。 0.1。就跑吧。
I suspect its an encoding/widths issue. Both are a tad off, though I can't put my finger on why.
Here are some suspects:
First
The text stream is defined in UTF-16 LE. charNULLcharNULL, using the normal string drawing command syntax:
(some text) Tj
There's a way to escape any old character value into a () string. You can also define strings in hex thusly:
<203245> Tj
Neither method are used, just the questionable inline nulls. That could cause an issue in GS if it's trying to work with pointers to char without lengths associated with them.
Second
The widths array is dumb. You can define widths in groups thusly:
[ 32 [450 525 500] 37 [600 250] 40 [0] ]
This defines
32: 450
33: 525
34: 500
37: 600
38: 250
40: 0
These fonts defines their consecutive widths in individual arrays. Not illegal, but definitely wasteful/stupid, and if GS were coded to EXPECT gaps between the arrays, it could induce a bug.
There's also some extremely fishy values in the array. 32 through 126 are defined consecutively, but then it starts jumping all over:
...126 [600] 8364 [500] 8216 [222] 402 [500] 8222 [389]. 8230 [1000] 8224 [444]..
. and then goes back to being consecutive from 160 to 255.Just weird.
Third
I'm not even remotely sure, but the CIDToGIDMap stream contains an AWEFUL lot of nulls.
Bottom line
Those fonts are fishy. And I've never heard of "Bellflower Books" or "UFPDF 0.1"
That version number makes me cringe. It should make you cringe too.
Googleing for "UFPDF" I found this note from the author:
UFPDF is a PHP library that sits on top of FPDF. 0.1. Just run away.