使用 Ghostscript 从 Postscript 文件中创建仅包含文本且不包含图像的 TIFF

发布于 2024-11-16 09:21:26 字数 159 浏览 9 评论 0原文

是否可以从 postscript 文件(从具有可读文本和图像的 pdf 文档创建)创建 tiff 文件到没有图像而只有文本的 tiff 文件?

比如添加一个最大缓冲区,这样图像就会被删除,只剩下文本?

如果文本周围的框和线也可以被删除,那就太棒了。

此致!

Is it possible to create a tiff file from a postscript-file (created from a pdf-document with readable text and images) into a tiff file without the images and only the text?

Like add a maxbuffer so images will be removed and only text remaining?

And if boxes and lines around text could be removed as well that would be awesome.

Best regards!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

勿忘初心 2024-11-23 09:21:26

您可以重新定义各种“图像”运算符,以便它们不执行任何操作:

/image {
 type /dicttype eq not { % uses up argument, only one if dict form
   pop pop pop pop   % remove the arguments for the non-dictionary form.
 } ifelse
} bind def

/imagemask {
 type /dicttype eq not { % uses up argument, only one if dict form
   pop pop pop pop   % remove the arguments for the non-dictionary form.
 } ifelse
} bind def

/colorimage {
  type /integertype eq {
    pop                  % multi
    0 1 3 -1 roll {pop} for % one for each colour component
  } {
    pop pop pop
  } ifelse
} bind def

将其另存为文件,并将该文件添加到您的 GS 调用中。

您可以通过重新定义笔画操作符来类似地删除线条:

/stroke {
  newpath
} bind def

直笔画更难,如果您需要的话,我建议您阅读 PLRM。

也可能是填充操作员:

/fill {
  newpath
} bind def

/eofill {
  newpath
} bind def

小心!有些文本不是使用文本“显示”运算符绘制的,而是由线条构造而成,或绘制为图像。如果您重新定义如上所示的运算符,这些技术将会失效。

请注意,PDF 解释器通常不允许重新定义运算符,因此您可能首先必须使用 ps2write 设备将 PDF 文件转换为 PostScript,然后通过 GS 运行生成的文件以获得 TIFF 文件。

You can redefine the various 'image' operators so that they don't do anything:

/image {
 type /dicttype eq not { % uses up argument, only one if dict form
   pop pop pop pop   % remove the arguments for the non-dictionary form.
 } ifelse
} bind def

/imagemask {
 type /dicttype eq not { % uses up argument, only one if dict form
   pop pop pop pop   % remove the arguments for the non-dictionary form.
 } ifelse
} bind def

/colorimage {
  type /integertype eq {
    pop                  % multi
    0 1 3 -1 roll {pop} for % one for each colour component
  } {
    pop pop pop
  } ifelse
} bind def

Save that as a file, and add the file to your GS invocation.

You can remove linework similarly by redefining the stroke operator:

/stroke {
  newpath
} bind def

rectstroke is harder, I suggest you read the PLRM if you need that one.

Possibly also the fill operator:

/fill {
  newpath
} bind def

/eofill {
  newpath
} bind def

Beware! Some text is not drawn using the text 'show' operators, but is constructed from linework, or drawn as images. These techniques will be defeated if you redefine the operators as shown above.

Note that the PDF interpreter often doesn't allow re-definition of operators, so you may first have to convert your PDF file to PostScript, using the ps2write device, then run the resulting file through GS to get a TIFF file.

安静 2024-11-23 09:21:26
gs -sDEVICE=bitrgbtags -o out.tags <myfile>

将创建一个带有标签的 ppm 文件 - 标签将每个像素标记为文本、矢量、图像等。

然后您可以使用 Ghostpdl/tools/GOT 中的 C 程序来处理图像。听起来您想编写一个新的 C 程序来将每个非文本像素设置为背景颜色或可能只是白色,这非常简单,以 GOT 子目录中的示例 C 程序作为指导(如果您是程序员) 。然后将 ppm 转换为 tiff。 Ken 提供了一种不需要像素处理的不同方法。

gs -sDEVICE=bitrgbtags -o out.tags <myfile>

will create a ppm file with tags - tags label each pixel as text, vector, image etc.

Then you can use the C programs in ghostpdl/tools/GOT to process the image. It sounds like you want to write a new C program to to set each non text pixel to the background color or maybe just white, this is fairly straightforward with the example C programs in the GOT subdirectory as a guide (if you are a programmer). Then you would convert the ppm to tiff. Ken provided a different way of doing this that doesn't require pixel processing.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文