我正在尝试使用命令行程序 convert
来将 PDF 转换为图像(JPEG 或 PNG)。这是其中一个 PDF 我正在尝试转换。
我希望程序修剪掉多余的空白并返回足够高质量的图像,以便可以轻松阅读上标。
这是我当前的最佳尝试。正如您所看到的,修剪效果很好,我只需要稍微提高分辨率即可。这是我正在使用的命令:
convert -trim 24.pdf -resize 500% -quality 100 -sharpen 0x1.0 24-11.jpg
我尝试做出以下有意识的决定:
- 将其大小调整得更大(对分辨率没有影响)
- 使质量尽可能高
- 使用
-sharpen
(我已经尝试了一系列值)
如果您有任何有关提高最终 PNG/JPEG 图像分辨率的建议,我们将不胜感激!
I'm trying to use the command line program convert
to take a PDF into an image (JPEG or PNG). Here is one of the PDFs that I'm trying to convert.
I want the program to trim off the excess white-space and return a high enough quality image that the superscripts can be read with ease.
This is my current best attempt. As you can see, the trimming works fine, I just need to sharpen up the resolution quite a bit. This is the command I'm using:
convert -trim 24.pdf -resize 500% -quality 100 -sharpen 0x1.0 24-11.jpg
I've tried to make the following conscious decisions:
- resize it larger (has no effect on the resolution)
- make the quality as high as possible
- use the
-sharpen
(I've tried a range of values)
Any suggestions please on getting the resolution of the image in the final PNG/JPEG higher would be greatly appreciated!
发布评论
评论(24)
Imagemagick 提供了转换工具,可用于执行各种复杂的图像处理任务。
将 PDF 文件的所有页面转换为图像
使用 convert 通过以下命令将 PDF 页面转换为图像:
在 (PHP)Laravel 中使用 Imagick 将 PDF 转换为高分辨率图像。
Imagemagick provides the convert tool that can be used to do various complicated image processing tasks.
Convert All Pages of PDF File to Images
Use convert to convert PDF pages to images with the following command:
Convert PDF to Image with high resolution In (PHP)Laravel using Imagick.
这适用于从多个 PDF 和图像文件创建单个文件:
其中:
-密度 300 = dpi
-trim = 有关透明度的内容 - 使边缘看起来平滑,看起来
-质量 100 = 质量与压缩( 100 % 质量)
-展平...对于多页,不要使用“展平”
this works for creating a single file from multiple PDF's and images files:
WHERE:
-density 300 = dpi
-trim = something about transparancy - makes edges look smooth, it seems
-quality 100 = quality vs compression (100 % quality)
-flatten ... for multi page, do not use "flatten"
看来以下工作有效:
它会产生左侧图像。将此与我的原始命令的结果进行比较(右侧的图像):
(要真正看到并欣赏两者之间的差异,请右键单击每个选项并选择“在新选项卡中打开图像.. .".)
另请记住以下事实:
其分辨率为3060x3960像素,使用16位RGB色彩空间。
其分辨率为 758x996 像素,使用 8 位灰度色彩空间。
因此,无需调整大小;添加
-密度
标志。密度值 150 很奇怪 - 尝试一系列值会导致图像在两个方向上看起来都更糟糕!It appears that the following works:
It results in the left image. Compare this to the result of my original command (the image on the right):
(To really see and appreciate the differences between the two, right-click on each and select "Open Image in New Tab...".)
Also keep the following facts in mind:
Its resolution is 3060x3960 pixels, using 16-bit RGB color space.
Its resolution is 758x996 pixels, using 8-bit Gray color space.
So, no need to resize; add the
-density
flag. The density value 150 is weird -- trying a range of values results in a worse looking image in both directions!我个人喜欢这个。
它是文件大小的两倍多一点,但对我来说看起来更好。
-密度 300
设置 PDF 渲染的 dpi。-trim
删除与角像素颜色相同的所有边缘像素。-quality 100
将 JPEG 压缩质量设置为最高质量。像
-sharpen
这样的东西不能很好地处理文本,因为它们会撤销字体渲染系统为使其更清晰而所做的事情。如果您确实希望将其放大,请在此处使用调整大小,并可能使用较大的 dpi 值(例如
targetDPI *scalingFactor
),这将以您想要的分辨率/大小呈现 PDF。imagemagick.org 上的参数说明位于此处
Personally I like this.
It's a little over twice the file size, but it looks better to me.
-density 300
sets the dpi that the PDF is rendered at.-trim
removes any edge pixels that are the same color as the corner pixels.-quality 100
sets the JPEG compression quality to the highest quality.Things like
-sharpen
don't work well with text because they undo things your font rendering system did to make it more legible.If you actually want it blown up use resize here and possibly a larger dpi value of something like
targetDPI * scalingFactor
That will render the PDF at the resolution/size you intend.Descriptions of the parameters on imagemagick.org are here
我在
convert
方面确实没有取得很好的成功[2020 年 5 月更新:实际上:它几乎对我不起作用],但我在pdftoppm
方面取得了巨大的成功。以下是从 PDF 生成高质量图像的几个示例:[每页生成约 25 MB 大小的文件] 以 300 DPI 输出未压缩的 .tif 文件格式> 进入名为“images”的文件夹,文件名为 pg-1.tif、pg-2.tif、pg-3.tif >,等等:
[每 pg 生成约 1MB 大小的文件]输出输入.jpg 格式,300 DPI:
[每页生成约 2MB 大小的文件] 以 输出>.jpg 格式最高质量(最小压缩)并且仍然300 DPI:
对于更多解释、选项和示例,在此处查看我的完整答案。
另请参阅
pdf2searchablepdf
gscan2pdf
I really haven't had good success with
convert
[update May 2020: actually: it pretty much never works for me], but I've had EXCELLENT success withpdftoppm
. Here's a couple examples of producing high-quality images from a PDF:[Produces ~25 MB-sized files per pg] Output uncompressed .tif file format at 300 DPI into a folder called "images", with files being named pg-1.tif, pg-2.tif, pg-3.tif, etc:
[Produces ~1MB-sized files per pg] Output in .jpg format at 300 DPI:
[Produces ~2MB-sized files per pg] Output in .jpg format at highest quality (least compression) and still at 300 DPI:
For more explanations, options, and examples, see my full answer here.
See also
pdf2searchablepdf
gscan2pdf
我在命令行上使用
pdftoppm
来获取初始图像,通常分辨率为300dpi,因此pdftoppm -r 300
,然后使用convert
进行修剪和 PNG 转换。I use
pdftoppm
on the command line to get the initial image, typically with a resolution of 300dpi, sopdftoppm -r 300
, then useconvert
to do the trimming and PNG conversion.通常我以原始分辨率提取带有“pdfimages”的嵌入图像,然后使用 ImageMagick 转换为所需的格式:
这会生成最佳和最小的结果文件。
注意:对于有损 JPG 嵌入图像,您必须使用 -j:
使用最近的“poppler-util”(0.50+,2016),您可以使用 -all 将有损保存为 jpg,将无损保存为 png,所以很简单:
始终提取PDF 中的最佳质量内容。
在很少提供的 Win 平台上,您必须从以下位置下载最新的(0.68,2018)“poppler-util”二进制文件:
http://blog.alivate.com.au/poppler-windows/
normally I extract the embedded image with 'pdfimages' at the native resolution, then use ImageMagick's convert to the needed format:
this generate the best and smallest result file.
Note: For lossy JPG embedded images, you had to use -j:
With recent "poppler-util" (0.50+, 2016) you can use -all that save lossy as jpg and lossless as png, so a simple:
extract always the best possible quality content from PDF.
On little provided Win platform you had to download a recent (0.68, 2018) 'poppler-util' binary from:
http://blog.alivate.com.au/poppler-windows/
在ImageMagick中,您可以进行“超级采样”。您指定较大的密度,然后根据最终输出尺寸的需要调整尺寸。例如您的图像:
下载图像以全分辨率查看以进行比较。
如果您希望进行进一步处理,我不建议保存为 JPG。
如果您希望输出与输入大小相同,则将大小调整为密度与 72 之比的倒数。例如,-密度 288 和 -resize 25%。 288=4*72 和 25%=1/4
密度越大,得到的质量越好,但处理时间会更长。
In ImageMagick, you can do "supersampling". You specify a large density and then resize down as much as desired for the final output size. For example with your image:
Download the image to view at full resolution for comparison..
I do not recommend saving to JPG if you are expecting to do further processing.
If you want the output to be the same size as the input, then resize to the inverse of the ratio of your density to 72. For example, -density 288 and -resize 25%. 288=4*72 and 25%=1/4
The larger the density the better the resulting quality, but it will take longer to process.
我发现,在将大型 PDF 批处理为 PNG 和 JPG 时,使用
convert
使用的底层gs
(又名 Ghostscript)命令,它更快、更稳定。您可以在
convert -verbose
的输出中看到该命令,并且还有一些可能的调整 (YMMV),这些调整很难/不可能直接通过convert
访问。然而,使用
gs
进行修剪和锐化会更困难,所以,正如我所说,YMMV!I have found it both faster and more stable when batch-processing large PDFs into PNGs and JPGs to use the underlying
gs
(aka Ghostscript) command thatconvert
uses.You can see the command in the output of
convert -verbose
and there are a few more tweaks possible there (YMMV) that are difficult / impossible to access directly viaconvert
.However, it would be harder to do your trimming and sharpening using
gs
, so, as I said, YMMV!它还可以为您带来良好的结果:
It also gives you good results:
Linux 用户:我尝试了
convert
命令行实用程序(将 PDF 转换为 PNG),但对结果并不满意。我发现这更容易,效果更好:pdftk file.pdf cat 3 输出 page3.pdf
GIMP
打开(导入)该 pdf分辨率
从100
更改为300
或600像素/英寸
编辑:
根据
注释
中的要求添加图片。使用的转换命令:convert -密度 300 -trim struct2vec.pdf -quality 100 struct2vec.png
GIMP
:以 300 dpi (px/in) 导入;导出为 PNG 压缩级别 3。我没有在命令行上使用 GIMP(回复:我的评论,如下)。
Linux user here: I tried the
convert
command-line utility (for PDF to PNG) and I was not happy with the results. I found this to be easier, with a better result:pdftk file.pdf cat 3 output page3.pdf
GIMP
Resolution
from100
to300
or600 pixel/in
GIMP
export as PNG (change file extension to .png)Edit:
Added picture, as requested in the
Comments
. Convert command used:convert -density 300 -trim struct2vec.pdf -quality 100 struct2vec.png
GIMP
: imported at 300 dpi (px/in); exported as PNG compression level 3.I have not used GIMP on the command line (re: my comment, below).
对于Windows(在 W11 上测试):
您需要安装:
ImageMagick https://imagemagick.org/index.php
Ghostscript
https://www.ghostscript.com/releases/gsdnld.html
其他信息:
注意使用
-flatten
参数,因为它只能生成第一页作为图像使用
-scene 1
参数从索引 1 开始使用图像名称convert
命令中提到的命令已被弃用,取而代之的是magick
< /p>For Windows (tested on W11):
You need install:
ImageMagick https://imagemagick.org/index.php
ghostscript
https://www.ghostscript.com/releases/gsdnld.html
Additional info:
Watch for using
-flatten
parameter since it can produce only first page as imageUse
-scene 1
parameter to start at index 1 with images namesconvert
command mentioned in question has been deprecated in favor tomagick
另一种建议是您可以使用 GIMP。
只需在 GIMP 中加载 PDF 文件 -> 另存为 .xcf,然后您就可以对图像执行任何您想要的操作。
One more suggestion is that you can use GIMP.
Just load the PDF file in GIMP->save as .xcf and then you can do whatever you want to the image.
对我来说看起来很完美
Looked perfect to me
我使用过pdf2image。一个简单的 Python 库,工作起来就像魅力一样。
首先在非 Linux 机器上安装 poppler 。您只需下载 zip 即可。解压到 Program Files 并将 bin 添加到机器路径。
之后,您可以在 python 类中使用 pdf2image,如下所示:
我不擅长 python,但能够将其制作为 exe。
稍后您可以使用带有文件输入和输出参数的exe。我已经在 C# 中使用了它,并且一切正常。
图像质量良好。 OCR 工作正常。
编辑:
这是我的另一个发现,您不需要安装 Poppler 进行转换。
只需从Python 制作converter.exe 并将其放在Poppler 窗口的二进制bin 文件夹中即可。
我想它也适用于天蓝色。
I have used pdf2image. A simple python library that works like charm.
First install poppler on non linux machine. You can just download the zip. Unzip in Program Files and add bin to Machine Path.
After that you can use pdf2image in python class like this:
I am not good with python but was able to make exe of it.
Later you may use the exe with file input and output parameter. I have used it in C# and things are working fine.
Image quality is good. OCR works fine.
Edited:
Here is my another finding, You don't need to install Poppler for conversion.
Just make your converter.exe from Python and place it in binary bin folder of Poppler window.
I suppose it will work on azure aswell.
您附加的 PNG 文件看起来非常模糊。如果您需要对生成为 PDF 预览的每个图像使用额外的后处理,则会降低解决方案的性能。
2JPEG 可以将您附加的 PDF 文件转换为漂亮的锐化 JPG 并在一次调用中裁剪空白边距:
PNG file you attached looks really blurred. In case if you need to use additional post-processing for each image you generated as PDF preview, you will decrease performance of your solution.
2JPEG can convert PDF file you attached to a nice sharpen JPG and crop empty margins in one call:
使用此命令行:
这应该可以按照您的要求正确转换文件。
Use this commandline:
This should correctly convert the file as you've asked for.
我使用 icepdf 一个开源 java pdf 引擎。查看 Office 演示。
我还尝试过 imagemagick 和 pdftoppm,pdftoppm和icepdf都比imagemagick具有更高的分辨率。
I use icepdf an open source java pdf engine. Check the office demo.
I've also tried imagemagick and pdftoppm, both pdftoppm and icepdf has a high resolution than imagemagick.
在投票之前请注意,这个解决方案适用于使用图形界面的 Gimp,而不适用于使用命令行的 ImageMagick,但作为替代方案,它对我来说效果非常好,这就是为什么我发现有必要分享
按照这些简单的步骤从 PDF 文档中提取任何格式的图像
N/B:如果您只需要封面图像,请仅选择第一页。
就这样。
我希望这有帮助
Please take note before down voting, this solution is for Gimp using a graphical interface, and not for ImageMagick using a command line, but it worked perfectly fine for me as an alternative, and that is why I found it needful to share here.
Follow these simple steps to extract images in any format from PDF documents
N/B: If you need only the cover images, select only the first page.
That's all.
I hope this helps
这里的许多答案都集中在使用 OP 问题设置的 magick (或其依赖项 GhostScript),其中一些建议将 Gimp 作为替代方案,但没有描述为什么某些设置可能最适合不同的情况。
以OP“样本”为例,要求是清晰的修剪图像尽可能小,但保持良好的可读性。此处结果为 58 KB 中的 96 dpi(与矢量源 54 KB 相比略有增加),即使放大,也能保留良好的图像。将其与可接受的 72 dpi (226 KB) 进行比较回答上面的图。
关键点是任何图像处理器都可以使用配置文件作为输入从命令行批量运行,因此这里 IrfanView(带或不带 GS)设置为自动裁剪 pdf 页面)和输出以默认 96 dpi 转为 PNG,仅使用 4 BitPerPixel 颜色实现 16 种灰度。
通过将分辨率降至 72 可以进一步减小尺寸,但 96 是 PNG 屏幕显示的最佳设置。
如果您有源 RTF,则一页大约为 3.52 KB(3,605 字节)。您可以通过重新打印以编程方式导出为 PDF 或图像。
因此高质量结果将仅为 31.1 KB(31,918 字节)
SO 对于“相同质量”作为 OP 首选结果:
OP 结果 220 KB(226,220 字节)上面的图像像素数完全相同,但 31.1 KB(31,918 字节)是大小的 1/7,大约是存储大小的 14%。
Many answers here concentrate on using magick (or its dependency GhostScript) as set by the OP question, with a few suggesting Gimp as an alternative, without describing why some settings may work best for different cases.
Taking the OP "sample" the requirement is a crisp trimmed image as small as possible yet retaining good readability. and here the result is 96 dpi in 58 KB (a very small increase on the vector source 54 KB) yet retains a good image even zoomed in. compare that with 72 dpi (226 KB) in the accepted answer image above.
The key point is any image processor can be scripted to batch run from the command line using a profile as input, so here IrfanView (with or without GS) is set to auto crop the pdf page(s) and output at a default 96 dpi to PNG using only 4 BitPerPixel colour for 16 shades of greys.
The size could be further reduced by dropping resolution to 72 but 96 is an optimal setting for PNG screen display.
If you have a source RTF, Here it would be about 3.52 KB (3,605 bytes) for the one page. You can programmatically export to PDF or image by reprinting.
Thus a High Quality result will be only 31.1 KB (31,918 bytes)
SO for "SAME QUALITY" as OP preferred result:
OP result 220 KB (226,220 bytes) Above image exactly same number of pixels, but at 31.1 KB (31,918 bytes) is 1/7th the size or roughly 14% the storage size.
以下 python 脚本适用于任何 Mac(Snow Leopard 及更高版本)。它可以在命令行上使用连续的 PDF 文件作为参数,或者您可以将其放入 Automator 中的“运行 Shell 脚本”操作中,并创建一个服务(Mojave 中的快速操作)。
您可以在脚本中设置输出图像的分辨率。
脚本 和 快速操作 可以下载来自github。
The following python script will work on any Mac (Snow Leopard and upward). It can be used on the command line with successive PDF files as arguments, or you can put in into a Run Shell Script action in Automator, and make a Service (Quick Action in Mojave).
You can set the resolution of the output image in the script.
The script and a Quick Action can be downloaded from github.
在 iOS Swift 中从 Pdf 获取图像最佳解决方案
//用法
get Image from Pdf in iOS Swift Best solution
//Usage
实际上,在 Mac 上使用 Preview 很容易做到。您所要做的就是在预览中打开文件并另存为(或导出)png 或 jpeg,但请确保在窗口底部使用至少 300 dpi 以获得高质量图像。
It's actually pretty easy to do with Preview on a mac. All you have to do is open the file in Preview and save-as (or export) a png or jpeg but make sure that you use at least 300 dpi at the bottom of the window to get a high quality image.
您可以在 LibreOffice Draw 中执行此操作(通常预装在 Ubuntu 中):
You can do it in LibreOffice Draw (which is usually preinstalled in Ubuntu):