将 PDF 转换为高分辨率图像

发布于 2024-11-19 02:33:44 字数 721 浏览 3 评论 0 原文

我正在尝试使用命令行程序 convert 来将 PDF 转换为图像(JPEG 或 PNG)。这是其中一个 PDF 我正在尝试转换。

我希望程序修剪掉多余的空白并返回足够高质量的图像,以便可以轻松阅读上标。

这是我当前的最佳尝试。正如您所看到的,修剪效果很好,我只需要稍微提高分辨率即可。这是我正在使用的命令:

convert -trim 24.pdf -resize 500% -quality 100 -sharpen 0x1.0 24-11.jpg

我尝试做出以下有意识的决定:

  • 将其大小调整得更大(对分辨率没有影响)
  • 使质量尽可能高
  • 使用-sharpen(我已经尝试了一系列值)

如果您有任何有关提高最终 PNG/JPEG 图像分辨率的建议,我们将不胜感激!

I'm trying to use the command line program convert to take a PDF into an image (JPEG or PNG). Here is one of the PDFs that I'm trying to convert.

I want the program to trim off the excess white-space and return a high enough quality image that the superscripts can be read with ease.

This is my current best attempt. As you can see, the trimming works fine, I just need to sharpen up the resolution quite a bit. This is the command I'm using:

convert -trim 24.pdf -resize 500% -quality 100 -sharpen 0x1.0 24-11.jpg

I've tried to make the following conscious decisions:

  • resize it larger (has no effect on the resolution)
  • make the quality as high as possible
  • use the -sharpen (I've tried a range of values)

Any suggestions please on getting the resolution of the image in the final PNG/JPEG higher would be greatly appreciated!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(24

君勿笑 2024-11-26 02:33:45

Imagemagick 提供了转换工具,可用于执行各种复杂的图像处理任务。

将 PDF 文件的所有页面转换为图像
使用 convert 通过以下命令将 PDF 页面转换为图像:

convert -density 150 presentation.pdf -quality 90 output-%3d.jpg

在 (PHP)Laravel 中使用 Imagick 将 PDF 转换为高分辨率图像。

 $pdf_path = Storage::disk('public')->path($product_asset->pdf_path);
 $directory_create = Storage::disk('public')->path('products/'.$product- 
 >id.'/pdf_images');
 if (!file_exists($directory_create)) {
    mkdir($directory_create, 0777, true);
 }

 $output_images = $directory_create.'/';

 $im = new Imagick();
 $im->setResolution(250, 250);
 $im->readImage($pdf_path);
 $im->setImageFormat('jpg');
 $im->setImageCompression(Imagick::COMPRESSION_JPEG);
 $im->setImageCompressionQuality(100);
 $im->setCompressionQuality(100);
 $im->clear();
 $im->destroy();

Imagemagick provides the convert tool that can be used to do various complicated image processing tasks.

Convert All Pages of PDF File to Images
Use convert to convert PDF pages to images with the following command:

convert -density 150 presentation.pdf -quality 90 output-%3d.jpg

Convert PDF to Image with high resolution In (PHP)Laravel using Imagick.

 $pdf_path = Storage::disk('public')->path($product_asset->pdf_path);
 $directory_create = Storage::disk('public')->path('products/'.$product- 
 >id.'/pdf_images');
 if (!file_exists($directory_create)) {
    mkdir($directory_create, 0777, true);
 }

 $output_images = $directory_create.'/';

 $im = new Imagick();
 $im->setResolution(250, 250);
 $im->readImage($pdf_path);
 $im->setImageFormat('jpg');
 $im->setImageCompression(Imagick::COMPRESSION_JPEG);
 $im->setImageCompressionQuality(100);
 $im->setCompressionQuality(100);
 $im->clear();
 $im->destroy();
戏舞 2024-11-26 02:33:45

这适用于从多个 PDF 和图像文件创建单个文件:

php exec('convert -density 300 -trim "/path/to/input_filename_1.png" "/path/to/input_filename_2.pdf" "/path/to/input_filename_3.png" -quality 100 "/path/to/output_filename_0.pdf"');

其中:

-密度 300 = dpi

-trim = 有关透明度的内容 - 使边缘看起来平滑,看起来

-质量 100 = 质量与压缩( 100 % 质量)

-展平...对于多页,不要使用“展平”

this works for creating a single file from multiple PDF's and images files:

php exec('convert -density 300 -trim "/path/to/input_filename_1.png" "/path/to/input_filename_2.pdf" "/path/to/input_filename_3.png" -quality 100 "/path/to/output_filename_0.pdf"');

WHERE:

-density 300 = dpi

-trim = something about transparancy - makes edges look smooth, it seems

-quality 100 = quality vs compression (100 % quality)

-flatten ... for multi page, do not use "flatten"

夜灵血窟げ 2024-11-26 02:33:44

看来以下工作有效:

convert           \
   -verbose       \
   -density 150   \
   -trim          \
    test.pdf      \
   -quality 100   \
   -flatten       \
   -sharpen 0x1.0 \
    24-18.jpg

它会产生左侧图像。将此与我的原始命令的结果进行比较(右侧的图像):

(要真正看到并欣赏两者之间的差异,请右键单击每个选项并选择“在新选项卡中打开图像.. .".)

另请记住以下事实:

  • 右侧最糟糕、模糊的图像的文件大小为 1.941.702 字节 (1.85 MByte)。
    其分辨率为3060x3960像素,使用16位RGB色彩空间。
  • 左侧效果更好、更清晰的图像的文件大小为 337.879 字节 (330 kByte)。
    其分辨率为 758x996 像素,使用 8 位灰度色彩空间。

因此,无需调整大小;添加 -密度 标志。密度值 150 很奇怪 - 尝试一系列值会导致图像在两个方向上看起来都更糟糕!

It appears that the following works:

convert           \
   -verbose       \
   -density 150   \
   -trim          \
    test.pdf      \
   -quality 100   \
   -flatten       \
   -sharpen 0x1.0 \
    24-18.jpg

It results in the left image. Compare this to the result of my original command (the image on the right):

  

(To really see and appreciate the differences between the two, right-click on each and select "Open Image in New Tab...".)

Also keep the following facts in mind:

  • The worse, blurry image on the right has a file size of 1.941.702 Bytes (1.85 MByte).
    Its resolution is 3060x3960 pixels, using 16-bit RGB color space.
  • The better, sharp image on the left has a file size of 337.879 Bytes (330 kByte).
    Its resolution is 758x996 pixels, using 8-bit Gray color space.

So, no need to resize; add the -density flag. The density value 150 is weird -- trying a range of values results in a worse looking image in both directions!

揽清风入怀 2024-11-26 02:33:44

我个人喜欢这个。

convert -density 300 -trim test.pdf -quality 100 test.jpg

它是文件大小的两倍多一点,但对我来说看起来更好。

-密度 300 设置 PDF 渲染的 dpi。

-trim 删除与角像素颜色相同的所有边缘像素。

-quality 100 将 JPEG 压缩质量设置为最高质量。

-sharpen 这样的东西不能很好地处理文本,因为它们会撤销字体渲染系统为使其更清晰而所做的事情。

如果您确实希望将其放大,请在此处使用调整大小,并可能使用较大的 dpi 值(例如 targetDPI *scalingFactor),这将以您想要的分辨率/大小呈现 PDF。

imagemagick.org 上的参数说明位于此处

Personally I like this.

convert -density 300 -trim test.pdf -quality 100 test.jpg

It's a little over twice the file size, but it looks better to me.

-density 300 sets the dpi that the PDF is rendered at.

-trim removes any edge pixels that are the same color as the corner pixels.

-quality 100 sets the JPEG compression quality to the highest quality.

Things like -sharpen don't work well with text because they undo things your font rendering system did to make it more legible.

If you actually want it blown up use resize here and possibly a larger dpi value of something like targetDPI * scalingFactor That will render the PDF at the resolution/size you intend.

Descriptions of the parameters on imagemagick.org are here

风轻花落早 2024-11-26 02:33:44

我在 convert 方面确实没有取得很好的成功[2020 年 5 月更新:实际上:它几乎对我不起作用],但我在 pdftoppm 方面取得了巨大的成功。以下是从 PDF 生成高质量图像的几个示例:

  1. [每页生成约 25 MB 大小的文件] 以 300 DPI 输出未压缩的 .tif 文件格式> 进入名为“images”的文件夹,文件名为 pg-1.tifpg-2.tifpg-3.tif >,等等:

    mkdir -p 图像 && pdftoppm -tiff -r 300 mypdf.pdf 图片/pg
    
  2. [每 pg 生成约 1MB 大小的文件]输出输入.jpg 格式,300 DPI

    mkdir -p 图像 && pdftoppm -jpeg -r 300 mypdf.pdf 图片/pg
    
  3. [每页生成约 2MB 大小的文件] 以 输出>.jpg 格式最高质量(最小压缩)并且仍然300 DPI

    mkdir -p 图像 && pdftoppm -jpeg -jpegopt 质量=100 -r 300 mypdf.pdf 图片/pg
    

对于更多解释、选项和示例,在此处查看我的完整答案

另请参阅

  1. 我的回答:询问 Ubuntu:如何将 PDF 转换为可搜索的 PDF w/pdf2searchablepdf
  2. 我的回答:如何在 Linux 中使用命令行将 PDF 转换为 JPG?
  3. 我的回答: Unix 和 Linux Linux:pdf转jpg,无质量损失; gscan2pdf

I really haven't had good success with convert [update May 2020: actually: it pretty much never works for me], but I've had EXCELLENT success with pdftoppm. Here's a couple examples of producing high-quality images from a PDF:

  1. [Produces ~25 MB-sized files per pg] Output uncompressed .tif file format at 300 DPI into a folder called "images", with files being named pg-1.tif, pg-2.tif, pg-3.tif, etc:

    mkdir -p images && pdftoppm -tiff -r 300 mypdf.pdf images/pg
    
  2. [Produces ~1MB-sized files per pg] Output in .jpg format at 300 DPI:

    mkdir -p images && pdftoppm -jpeg -r 300 mypdf.pdf images/pg
    
  3. [Produces ~2MB-sized files per pg] Output in .jpg format at highest quality (least compression) and still at 300 DPI:

    mkdir -p images && pdftoppm -jpeg -jpegopt quality=100 -r 300 mypdf.pdf images/pg
    

For more explanations, options, and examples, see my full answer here.

See also

  1. My answer: Ask Ubuntu: How to turn a PDF into a searchable PDF w/pdf2searchablepdf
  2. My answer: How to convert a PDF into JPG with command line in Linux?
  3. My answer: Unix & Linux: pdf to jpg without quality loss; gscan2pdf
笑着哭最痛 2024-11-26 02:33:44

我在命令行上使用pdftoppm来获取初始图像,通常分辨率为300dpi,因此pdftoppm -r 300,然后使用convert进行修剪和 PNG 转换。

I use pdftoppm on the command line to get the initial image, typically with a resolution of 300dpi, so pdftoppm -r 300, then use convert to do the trimming and PNG conversion.

紫轩蝶泪 2024-11-26 02:33:44

通常我以原始分辨率提取带有“pdfimages”的嵌入图像,然后使用 ImageMagick 转换为所需的格式:

$ pdfimages -list fileName.pdf
$ pdfimages fileName.pdf fileName   # save in .ppm format
$ convert fileName-000.ppm fileName-000.png

这会生成最佳和最小的结果文件。

注意:对于有损 JPG 嵌入图像,您必须使用 -j:

$ pdfimages -j fileName.pdf fileName   # save in .jpg format

使用最近的“poppler-util”(0.50+,2016),您可以使用 -all 将有损保存为 jpg,将无损保存为 png,所以很简单:

$ pdfimages -all fileName.pdf fileName

始终提取PDF 中的最佳质量内容。

在很少提供的 Win 平台上,您必须从以下位置下载最新的(0.68,2018)“poppler-util”二进制文件:
http://blog.alivate.com.au/poppler-windows/

normally I extract the embedded image with 'pdfimages' at the native resolution, then use ImageMagick's convert to the needed format:

$ pdfimages -list fileName.pdf
$ pdfimages fileName.pdf fileName   # save in .ppm format
$ convert fileName-000.ppm fileName-000.png

this generate the best and smallest result file.

Note: For lossy JPG embedded images, you had to use -j:

$ pdfimages -j fileName.pdf fileName   # save in .jpg format

With recent "poppler-util" (0.50+, 2016) you can use -all that save lossy as jpg and lossless as png, so a simple:

$ pdfimages -all fileName.pdf fileName

extract always the best possible quality content from PDF.

On little provided Win platform you had to download a recent (0.68, 2018) 'poppler-util' binary from:
http://blog.alivate.com.au/poppler-windows/

一生独一 2024-11-26 02:33:44

在ImageMagick中,您可以进行“超级采样”。您指定较大的密度,然后根据最终输出尺寸的需要调整尺寸。例如您的图像:

convert -density 600 test.pdf -background white -flatten -resize 25% test.png

输入图像描述这里

下载图像以全分辨率查看以进行比较。

如果您希望进行进一步处理,我不建议保存为 JPG。

如果您希望输出与输入大小相同,则将大小调整为密度与 72 之比的倒数。例如,-密度 288 和 -resize 25%。 288=4*72 和 25%=1/4

密度越大,得到的质量越好,但处理时间会更长。

In ImageMagick, you can do "supersampling". You specify a large density and then resize down as much as desired for the final output size. For example with your image:

convert -density 600 test.pdf -background white -flatten -resize 25% test.png

enter image description here

Download the image to view at full resolution for comparison..

I do not recommend saving to JPG if you are expecting to do further processing.

If you want the output to be the same size as the input, then resize to the inverse of the ratio of your density to 72. For example, -density 288 and -resize 25%. 288=4*72 and 25%=1/4

The larger the density the better the resulting quality, but it will take longer to process.

冰葑 2024-11-26 02:33:44

我发现,在将大型 PDF 批处理为 PNG 和 JPG 时,使用 convert 使用的底层 gs(又名 Ghostscript)命令,它更快、更稳定。

您可以在 convert -verbose 的输出中看到该命令,并且还有一些可能的调整 (YMMV),这些调整很难/不可能直接通过 convert 访问。

然而,使用 gs 进行修剪和锐化会更困难,所以,正如我所说,YMMV!

I have found it both faster and more stable when batch-processing large PDFs into PNGs and JPGs to use the underlying gs (aka Ghostscript) command that convert uses.

You can see the command in the output of convert -verbose and there are a few more tweaks possible there (YMMV) that are difficult / impossible to access directly via convert.

However, it would be harder to do your trimming and sharpening using gs, so, as I said, YMMV!

夏夜暖风 2024-11-26 02:33:44

它还可以为您带来良好的结果:

exec("convert -geometry 1600x1600 -density 200x200 -quality 100 test.pdf test_image.jpg");

It also gives you good results:

exec("convert -geometry 1600x1600 -density 200x200 -quality 100 test.pdf test_image.jpg");
这个俗人 2024-11-26 02:33:44

Linux 用户:我尝试了 convert 命令行实用程序(将 PDF 转换为 PNG),但对结果并不满意。我发现这更容易,效果更好:

  • 使用 pdftk 提取 pdf 页面
    • 例如:pdftk file.pdf cat 3 输出 page3.pdf
  • 使用GIMP 打开(导入)该 pdf
    • 重要提示:将导入分辨率100更改为300600像素/英寸
  • >GIMP 导出为 PNG(将文件扩展名更改为 .png)

编辑:

根据注释中的要求添加图片。使用的转换命令:

convert -密度 300 -trim struct2vec.pdf -quality 100 struct2vec.png

GIMP :以 300 dpi (px/in) 导入;导出为 PNG 压缩级别 3。

我没有在命令行上使用 GIMP(回复:我的评论,如下)。

pdf2png

“输入图像此处描述"

Linux user here: I tried the convert command-line utility (for PDF to PNG) and I was not happy with the results. I found this to be easier, with a better result:

  • extract the pdf page(s) with pdftk
    • e.g.: pdftk file.pdf cat 3 output page3.pdf
  • open (import) that pdf with GIMP
    • important: change the import Resolution from 100 to 300 or 600 pixel/in
  • in GIMP export as PNG (change file extension to .png)

Edit:

Added picture, as requested in the Comments. Convert command used:

convert -density 300 -trim struct2vec.pdf -quality 100 struct2vec.png

GIMP : imported at 300 dpi (px/in); exported as PNG compression level 3.

I have not used GIMP on the command line (re: my comment, below).

pdf2png

enter image description here

笑,眼淚并存 2024-11-26 02:33:44

对于Windows(在 W11 上测试):

magick.exe -verbose -density 150 "input.pdf" -quality 100 -sharpen 0x1.0 output.jpg

您需要安装:

ImageMagick https://imagemagick.org/index.php

Ghostscript
https://www.ghostscript.com/releases/gsdnld.html

其他信息:

  • 注意使用 -flatten 参数,因为它只能生成第一页作为图像

  • 使用 -scene 1 参数从索引 1 开始使用图像名称

  • convert 命令中提到的命令已被弃用,取而代之的是 magick< /p>

For Windows (tested on W11):

magick.exe -verbose -density 150 "input.pdf" -quality 100 -sharpen 0x1.0 output.jpg

You need install:

ImageMagick https://imagemagick.org/index.php

ghostscript
https://www.ghostscript.com/releases/gsdnld.html

Additional info:

  • Watch for using -flatten parameter since it can produce only first page as image

  • Use -scene 1 parameter to start at index 1 with images names

  • convert command mentioned in question has been deprecated in favor to magick

兮子 2024-11-26 02:33:44

另一种建议是您可以使用 GIMP。

只需在 GIMP 中加载 PDF 文件 -> 另存为 .xcf,然后您就可以对图像执行任何您想要的操作。

One more suggestion is that you can use GIMP.

Just load the PDF file in GIMP->save as .xcf and then you can do whatever you want to the image.

温柔嚣张 2024-11-26 02:33:44
convert -density 300 * airbnb.pdf

对我来说看起来很完美

convert -density 300 * airbnb.pdf

Looked perfect to me

又爬满兰若 2024-11-26 02:33:44

我使用过pdf2image。一个简单的 Python 库,工作起来就像魅力一样。

首先在非 Linux 机器上安装 poppler 。您只需下载 zip 即可。解压到 Program Files 并将 bin 添加到机器路径。

之后,您可以在 python 类中使用 pdf2image,如下所示:

from pdf2image import convert_from_path, convert_from_bytes
images_from_path = convert_from_path(
   inputfile,
   output_folder=outputpath,
   grayscale=True, fmt='jpeg')

我不擅长 python,但能够将其制作为 exe。
稍后您可以使用带有文件输入和输出参数的exe。我已经在 C# 中使用了它,并且一切正常。

图像质量良好。 OCR 工作正常。

编辑:
这是我的另一个发现,您不需要安装 Poppler 进行转换。
只需从Python 制作converter.exe 并将其放在Poppler 窗口的二进制bin 文件夹中即可。
我想它也适用于天蓝色。

I have used pdf2image. A simple python library that works like charm.

First install poppler on non linux machine. You can just download the zip. Unzip in Program Files and add bin to Machine Path.

After that you can use pdf2image in python class like this:

from pdf2image import convert_from_path, convert_from_bytes
images_from_path = convert_from_path(
   inputfile,
   output_folder=outputpath,
   grayscale=True, fmt='jpeg')

I am not good with python but was able to make exe of it.
Later you may use the exe with file input and output parameter. I have used it in C# and things are working fine.

Image quality is good. OCR works fine.

Edited:
Here is my another finding, You don't need to install Poppler for conversion.
Just make your converter.exe from Python and place it in binary bin folder of Poppler window.
I suppose it will work on azure aswell.

眼藏柔 2024-11-26 02:33:44

您附加的 PNG 文件看起来非常模糊。如果您需要对生成为 PDF 预览的每个图像使用额外的后处理,则会降低解决方案的性能。

2JPEG 可以将您附加的 PDF 文件转换为漂亮的锐化 JPG 并在一次调用中裁剪空白边距:

2jpeg.exe -src "C:\In\*.*" -dst "C:\Out" -oper Crop method:autocrop

PNG file you attached looks really blurred. In case if you need to use additional post-processing for each image you generated as PDF preview, you will decrease performance of your solution.

2JPEG can convert PDF file you attached to a nice sharpen JPG and crop empty margins in one call:

2jpeg.exe -src "C:\In\*.*" -dst "C:\Out" -oper Crop method:autocrop
惟欲睡 2024-11-26 02:33:44

使用此命令行:

convert -geometry 3600x3600 -density 300x300 -quality 100 TEAM\ 4.pdf team4.png

这应该可以按照您的要求正确转换文件。

Use this commandline:

convert -geometry 3600x3600 -density 300x300 -quality 100 TEAM\ 4.pdf team4.png

This should correctly convert the file as you've asked for.

闻呓 2024-11-26 02:33:44

我使用 icepdf 一个开源 java pdf 引擎。查看 Office 演示

package image2pdf;

import org.icepdf.core.exceptions.PDFException;
import org.icepdf.core.exceptions.PDFSecurityException;
import org.icepdf.core.pobjects.Document;
import org.icepdf.core.pobjects.Page;
import org.icepdf.core.util.GraphicsRenderingHints;
import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.awt.image.RenderedImage;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;

public class pdf2image {

   public static void main(String[] args) {

      Document document = new Document();
      try {
         document.setFile("C:\\Users\\Dell\\Desktop\\test.pdf");
      } catch (PDFException ex) {
         System.out.println("Error parsing PDF document " + ex);
      } catch (PDFSecurityException ex) {
         System.out.println("Error encryption not supported " + ex);
      } catch (FileNotFoundException ex) {
         System.out.println("Error file not found " + ex);
      } catch (IOException ex) {
         System.out.println("Error IOException " + ex);
      }

      // save page captures to file.
      float scale = 1.0f;
      float rotation = 0f;

      // Paint each pages content to an image and
      // write the image to file
      for (int i = 0; i < document.getNumberOfPages(); i++) {
         try {
         BufferedImage image = (BufferedImage) document.getPageImage(
             i, GraphicsRenderingHints.PRINT, Page.BOUNDARY_CROPBOX, rotation, scale);

         RenderedImage rendImage = image;
         try {
            System.out.println(" capturing page " + i);
            File file = new File("C:\\Users\\Dell\\Desktop\\test_imageCapture1_" + i + ".png");
            ImageIO.write(rendImage, "png", file);
         } catch (IOException e) {
            e.printStackTrace();
         }
         image.flush();
         }catch(Exception e){
             e.printStackTrace();
         }
      }

      // clean up resources
      document.dispose();
   }
}

我还尝试过 imagemagickpdftoppm,pdftoppm和icepdf都比imagemagick具有更高的分辨率。

I use icepdf an open source java pdf engine. Check the office demo.

package image2pdf;

import org.icepdf.core.exceptions.PDFException;
import org.icepdf.core.exceptions.PDFSecurityException;
import org.icepdf.core.pobjects.Document;
import org.icepdf.core.pobjects.Page;
import org.icepdf.core.util.GraphicsRenderingHints;
import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.awt.image.RenderedImage;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;

public class pdf2image {

   public static void main(String[] args) {

      Document document = new Document();
      try {
         document.setFile("C:\\Users\\Dell\\Desktop\\test.pdf");
      } catch (PDFException ex) {
         System.out.println("Error parsing PDF document " + ex);
      } catch (PDFSecurityException ex) {
         System.out.println("Error encryption not supported " + ex);
      } catch (FileNotFoundException ex) {
         System.out.println("Error file not found " + ex);
      } catch (IOException ex) {
         System.out.println("Error IOException " + ex);
      }

      // save page captures to file.
      float scale = 1.0f;
      float rotation = 0f;

      // Paint each pages content to an image and
      // write the image to file
      for (int i = 0; i < document.getNumberOfPages(); i++) {
         try {
         BufferedImage image = (BufferedImage) document.getPageImage(
             i, GraphicsRenderingHints.PRINT, Page.BOUNDARY_CROPBOX, rotation, scale);

         RenderedImage rendImage = image;
         try {
            System.out.println(" capturing page " + i);
            File file = new File("C:\\Users\\Dell\\Desktop\\test_imageCapture1_" + i + ".png");
            ImageIO.write(rendImage, "png", file);
         } catch (IOException e) {
            e.printStackTrace();
         }
         image.flush();
         }catch(Exception e){
             e.printStackTrace();
         }
      }

      // clean up resources
      document.dispose();
   }
}

I've also tried imagemagick and pdftoppm, both pdftoppm and icepdf has a high resolution than imagemagick.

甜心 2024-11-26 02:33:44

在投票之前请注意,这个解决方案适用于使用图形界面的 Gimp,而不适用于使用命令行的 ImageMagick,但作为替代方案,它对我来说效果非常好,这就是为什么我发现有必要分享

按照这些简单的步骤从 PDF 文档中提取任何格式的图像

  1. 下载 GIMP 图像处理程序
  2. 安装后打开该程序
  3. 打开要提取图像的 PDF 文档
  4. 仅选择页面您想要提取图像的 PDF 文档的 从。
    N/B:如果您只需要封面图像,请仅选择第一页。
  5. 选择要从中提取图像的页面后
  6. 单击“打开” 当 GIMP 页面打开时单击文件菜单
  7. 在“文件”菜单中选择导出为
  8. 选择您喜欢的文件类型弹出的对话框下方的扩展名(例如 png)。
  9. 单击导出将图像导出到您想要的位置。
  10. 然后,您可以检查文件资源管理器中是否有导出的图像。

就这样。

我希望这有帮助

Please take note before down voting, this solution is for Gimp using a graphical interface, and not for ImageMagick using a command line, but it worked perfectly fine for me as an alternative, and that is why I found it needful to share here.

Follow these simple steps to extract images in any format from PDF documents

  1. Download GIMP Image Manipulation Program
  2. Open the Program after installation
  3. Open the PDF document that you want to extract Images
  4. Select only the pages of the PDF document that you would want to extract images from.
    N/B: If you need only the cover images, select only the first page.
  5. Click open after selecting the pages that you want to extract images from
  6. Click on File menu when GIMP when the pages open
  7. Select Export as in the File menu
  8. Select your preferred file type by extension (say png) below the dialog box that pops up.
  9. Click on Export to export your image to your desired location.
  10. You can then check your file explorer for the exported image.

That's all.

I hope this helps

仙女山的月亮 2024-11-26 02:33:44

这里的许多答案都集中在使用 OP 问题设置的 magick (或其依赖项 GhostScript),其中一些建议将 Gimp 作为替代方案,但没有描述为什么某些设置可能最适合不同的情况。

以OP“样本”为例,要求是清晰的修剪图像尽可能小,但保持良好的可读性。此处结果为 58 KB 中的 96 dpi(与矢量源 54 KB 相比略有增加),即使放大,也能保留良好的图像。将其与可接受的 72 dpi (226 KB) 进行比较回答上面的图。

输入图片此处的描述

关键点是任何图像处理器都可以使用配置文件作为输入从命令行批量运行,因此这里 IrfanView(带或不带 GS)设置为自动裁剪 pdf 页面)和输出以默认 96 dpi 转为 PNG,仅使用 4 BitPerPixel 颜色实现 16 种灰度。
通过将分辨率降至 72 可以进一步减小尺寸,但 96 是 PNG 屏幕显示的最佳设置。

如果您有源 RTF,则一页大约为 3.52 KB(3,605 字节)。您可以通过重新打印以编程方式导出为 PDF 或图像。

输入图片此处描述

因此高质量结果将仅为 31.1 KB(31,918 字节)

在此处输入图像描述

SO 对于“相同质量”作为 OP 首选结果:

OP 结果 220 KB(226,220 字节)上面的图像像素数完全相同,但 31.1 KB(31,918 字节)是大小的 1/7,大约是存储大小的 14%。

Many answers here concentrate on using magick (or its dependency GhostScript) as set by the OP question, with a few suggesting Gimp as an alternative, without describing why some settings may work best for different cases.

Taking the OP "sample" the requirement is a crisp trimmed image as small as possible yet retaining good readability. and here the result is 96 dpi in 58 KB (a very small increase on the vector source 54 KB) yet retains a good image even zoomed in. compare that with 72 dpi (226 KB) in the accepted answer image above.

enter image description here

The key point is any image processor can be scripted to batch run from the command line using a profile as input, so here IrfanView (with or without GS) is set to auto crop the pdf page(s) and output at a default 96 dpi to PNG using only 4 BitPerPixel colour for 16 shades of greys.
The size could be further reduced by dropping resolution to 72 but 96 is an optimal setting for PNG screen display.

If you have a source RTF, Here it would be about 3.52 KB (3,605 bytes) for the one page. You can programmatically export to PDF or image by reprinting.

enter image description here

Thus a High Quality result will be only 31.1 KB (31,918 bytes)

enter image description here

SO for "SAME QUALITY" as OP preferred result:

OP result 220 KB (226,220 bytes) Above image exactly same number of pixels, but at 31.1 KB (31,918 bytes) is 1/7th the size or roughly 14% the storage size.

想挽留 2024-11-26 02:33:44

以下 python 脚本适用于任何 Mac(Snow Leopard 及更高版本)。它可以在命令行上使用连续的 PDF 文件作为参数,或者您可以将其放入 Automator 中的“运行 Shell 脚本”操作中,并创建一个服务(Mojave 中的快速操作)。

您可以在脚本中设置输出图像的分辨率。

脚本快速操作 可以下载来自github。

#!/usr/bin/python
# coding: utf-8

import os, sys
import Quartz as Quartz
from LaunchServices import (kUTTypeJPEG, kUTTypeTIFF, kUTTypePNG, kCFAllocatorDefault) 

resolution = 300.0 #dpi
scale = resolution/72.0

cs = Quartz.CGColorSpaceCreateWithName(Quartz.kCGColorSpaceSRGB)
whiteColor = Quartz.CGColorCreate(cs, (1, 1, 1, 1))
# Options: kCGImageAlphaNoneSkipLast (no trans), kCGImageAlphaPremultipliedLast 
transparency = Quartz.kCGImageAlphaNoneSkipLast

#Save image to file
def writeImage (image, url, type, options):
    destination = Quartz.CGImageDestinationCreateWithURL(url, type, 1, None)
    Quartz.CGImageDestinationAddImage(destination, image, options)
    Quartz.CGImageDestinationFinalize(destination)
    return

def getFilename(filepath):
    i=0
    newName = filepath
    while os.path.exists(newName):
        i += 1
        newName = filepath + " %02d"%i
    return newName

if __name__ == '__main__':

    for filename in sys.argv[1:]:
        pdf = Quartz.CGPDFDocumentCreateWithProvider(Quartz.CGDataProviderCreateWithFilename(filename))
        numPages = Quartz.CGPDFDocumentGetNumberOfPages(pdf)
        shortName = os.path.splitext(filename)[0]
        prefix = os.path.splitext(os.path.basename(filename))[0]
        folderName = getFilename(shortName)
        try:
            os.mkdir(folderName)
        except:
            print "Can't create directory '%s'"%(folderName)
            sys.exit()

        # For each page, create a file
        for i in range (1, numPages+1):
            page = Quartz.CGPDFDocumentGetPage(pdf, i)
            if page:
        #Get mediabox
                mediaBox = Quartz.CGPDFPageGetBoxRect(page, Quartz.kCGPDFMediaBox)
                x = Quartz.CGRectGetWidth(mediaBox)
                y = Quartz.CGRectGetHeight(mediaBox)
                x *= scale
                y *= scale
                r = Quartz.CGRectMake(0,0,x, y)
        # Create a Bitmap Context, draw a white background and add the PDF
                writeContext = Quartz.CGBitmapContextCreate(None, int(x), int(y), 8, 0, cs, transparency)
                Quartz.CGContextSaveGState (writeContext)
                Quartz.CGContextScaleCTM(writeContext, scale,scale)
                Quartz.CGContextSetFillColorWithColor(writeContext, whiteColor)
                Quartz.CGContextFillRect(writeContext, r)
                Quartz.CGContextDrawPDFPage(writeContext, page)
                Quartz.CGContextRestoreGState(writeContext)
        # Convert to an "Image"
                image = Quartz.CGBitmapContextCreateImage(writeContext) 
        # Create unique filename per page
                outFile = folderName +"/" + prefix + " %03d.png"%i
                url = Quartz.CFURLCreateFromFileSystemRepresentation(kCFAllocatorDefault, outFile, len(outFile), False)
        # kUTTypeJPEG, kUTTypeTIFF, kUTTypePNG
                type = kUTTypePNG
        # See the full range of image properties on Apple's developer pages.
                options = {
                    Quartz.kCGImagePropertyDPIHeight: resolution,
                    Quartz.kCGImagePropertyDPIWidth: resolution
                    }
                writeImage (image, url, type, options)
                del page

The following python script will work on any Mac (Snow Leopard and upward). It can be used on the command line with successive PDF files as arguments, or you can put in into a Run Shell Script action in Automator, and make a Service (Quick Action in Mojave).

You can set the resolution of the output image in the script.

The script and a Quick Action can be downloaded from github.

#!/usr/bin/python
# coding: utf-8

import os, sys
import Quartz as Quartz
from LaunchServices import (kUTTypeJPEG, kUTTypeTIFF, kUTTypePNG, kCFAllocatorDefault) 

resolution = 300.0 #dpi
scale = resolution/72.0

cs = Quartz.CGColorSpaceCreateWithName(Quartz.kCGColorSpaceSRGB)
whiteColor = Quartz.CGColorCreate(cs, (1, 1, 1, 1))
# Options: kCGImageAlphaNoneSkipLast (no trans), kCGImageAlphaPremultipliedLast 
transparency = Quartz.kCGImageAlphaNoneSkipLast

#Save image to file
def writeImage (image, url, type, options):
    destination = Quartz.CGImageDestinationCreateWithURL(url, type, 1, None)
    Quartz.CGImageDestinationAddImage(destination, image, options)
    Quartz.CGImageDestinationFinalize(destination)
    return

def getFilename(filepath):
    i=0
    newName = filepath
    while os.path.exists(newName):
        i += 1
        newName = filepath + " %02d"%i
    return newName

if __name__ == '__main__':

    for filename in sys.argv[1:]:
        pdf = Quartz.CGPDFDocumentCreateWithProvider(Quartz.CGDataProviderCreateWithFilename(filename))
        numPages = Quartz.CGPDFDocumentGetNumberOfPages(pdf)
        shortName = os.path.splitext(filename)[0]
        prefix = os.path.splitext(os.path.basename(filename))[0]
        folderName = getFilename(shortName)
        try:
            os.mkdir(folderName)
        except:
            print "Can't create directory '%s'"%(folderName)
            sys.exit()

        # For each page, create a file
        for i in range (1, numPages+1):
            page = Quartz.CGPDFDocumentGetPage(pdf, i)
            if page:
        #Get mediabox
                mediaBox = Quartz.CGPDFPageGetBoxRect(page, Quartz.kCGPDFMediaBox)
                x = Quartz.CGRectGetWidth(mediaBox)
                y = Quartz.CGRectGetHeight(mediaBox)
                x *= scale
                y *= scale
                r = Quartz.CGRectMake(0,0,x, y)
        # Create a Bitmap Context, draw a white background and add the PDF
                writeContext = Quartz.CGBitmapContextCreate(None, int(x), int(y), 8, 0, cs, transparency)
                Quartz.CGContextSaveGState (writeContext)
                Quartz.CGContextScaleCTM(writeContext, scale,scale)
                Quartz.CGContextSetFillColorWithColor(writeContext, whiteColor)
                Quartz.CGContextFillRect(writeContext, r)
                Quartz.CGContextDrawPDFPage(writeContext, page)
                Quartz.CGContextRestoreGState(writeContext)
        # Convert to an "Image"
                image = Quartz.CGBitmapContextCreateImage(writeContext) 
        # Create unique filename per page
                outFile = folderName +"/" + prefix + " %03d.png"%i
                url = Quartz.CFURLCreateFromFileSystemRepresentation(kCFAllocatorDefault, outFile, len(outFile), False)
        # kUTTypeJPEG, kUTTypeTIFF, kUTTypePNG
                type = kUTTypePNG
        # See the full range of image properties on Apple's developer pages.
                options = {
                    Quartz.kCGImagePropertyDPIHeight: resolution,
                    Quartz.kCGImagePropertyDPIWidth: resolution
                    }
                writeImage (image, url, type, options)
                del page
相思碎 2024-11-26 02:33:44

在 iOS Swift 中从 Pdf 获取图像最佳解决方案

func imageFromPdf(pdfUrl : URL,atIndex index : Int, closure:@escaping((UIImage)->Void)){
    
    autoreleasepool {
        
        // Instantiate a `CGPDFDocument` from the PDF file's URL.
        guard let document = PDFDocument(url: pdfUrl) else { return }
        
        // Get the first page of the PDF document.
        guard let page = document.page(at: index) else { return }
        
        // Fetch the page rect for the page we want to render.
        let pageRect = page.bounds(for: .mediaBox)
        
        let renderer = UIGraphicsImageRenderer(size: pageRect.size)
        let img = renderer.image { ctx in
            // Set and fill the background color.
            UIColor.white.set()
            ctx.fill(CGRect(x: 0, y: 0, width: pageRect.width, height: pageRect.height))
            
            // Translate the context so that we only draw the `cropRect`.
            ctx.cgContext.translateBy(x: -pageRect.origin.x, y: pageRect.size.height - pageRect.origin.y)
            
            // Flip the context vertically because the Core Graphics coordinate system starts from the bottom.
            ctx.cgContext.scaleBy(x: 1.0, y: -1.0)
            
            // Draw the PDF page.
            page.draw(with: .mediaBox, to: ctx.cgContext)
        }
        closure(img)

    }
    
    
}

//用法

    let pdfUrl = URL(fileURLWithPath: "PDF URL")
    self.imageFromPdf2(pdfUrl: pdfUrl, atIndex: 0) { imageIS in
        
    }

get Image from Pdf in iOS Swift Best solution

func imageFromPdf(pdfUrl : URL,atIndex index : Int, closure:@escaping((UIImage)->Void)){
    
    autoreleasepool {
        
        // Instantiate a `CGPDFDocument` from the PDF file's URL.
        guard let document = PDFDocument(url: pdfUrl) else { return }
        
        // Get the first page of the PDF document.
        guard let page = document.page(at: index) else { return }
        
        // Fetch the page rect for the page we want to render.
        let pageRect = page.bounds(for: .mediaBox)
        
        let renderer = UIGraphicsImageRenderer(size: pageRect.size)
        let img = renderer.image { ctx in
            // Set and fill the background color.
            UIColor.white.set()
            ctx.fill(CGRect(x: 0, y: 0, width: pageRect.width, height: pageRect.height))
            
            // Translate the context so that we only draw the `cropRect`.
            ctx.cgContext.translateBy(x: -pageRect.origin.x, y: pageRect.size.height - pageRect.origin.y)
            
            // Flip the context vertically because the Core Graphics coordinate system starts from the bottom.
            ctx.cgContext.scaleBy(x: 1.0, y: -1.0)
            
            // Draw the PDF page.
            page.draw(with: .mediaBox, to: ctx.cgContext)
        }
        closure(img)

    }
    
    
}

//Usage

    let pdfUrl = URL(fileURLWithPath: "PDF URL")
    self.imageFromPdf2(pdfUrl: pdfUrl, atIndex: 0) { imageIS in
        
    }
猥琐帝 2024-11-26 02:33:44

实际上,在 Mac 上使用 Preview 很容易做到。您所要做的就是在预览中打开文件并另存为(或导出)png 或 jpeg,但请确保在窗口底部使用至少 300 dpi 以获得高质量图像。

It's actually pretty easy to do with Preview on a mac. All you have to do is open the file in Preview and save-as (or export) a png or jpeg but make sure that you use at least 300 dpi at the bottom of the window to get a high quality image.

少跟Wǒ拽 2024-11-26 02:33:44

您可以在 LibreOffice Draw 中执行此操作(通常预装在 Ubuntu 中):

  1. 在 LibreOffice Draw 中打开 PDF 文件。
  2. 滚动到您需要的页面。
  3. 确保文本/图像元素放置正确。如果没有,您可以在页面上调整/编辑它们。
  4. 顶部菜单:文件>导出...
  5. 在右下菜单中选择您需要的图像格式。我推荐PNG。
  6. 为您的文件命名并单击“保存”。
  7. 将出现选项窗口,以便您可以调整分辨率和大小。
  8. 单击“确定”,您就完成了。

You can do it in LibreOffice Draw (which is usually preinstalled in Ubuntu):

  1. Open PDF file in LibreOffice Draw.
  2. Scroll to the page you need.
  3. Make sure text/image elements are placed correctly. If not, you can adjust/edit them on the page.
  4. Top menu: File > Export...
  5. Select the image format you need in the bottom-right menu. I recommend PNG.
  6. Name your file and click Save.
  7. Options window will appear, so you can adjust resolution and size.
  8. Click OK, and you are done.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文