用 PHP 计算 PDF 的黑白和彩色页面
有人知道以下问题的可行解决方案:
需要检查 PDF 文件是否包含彩色页面。需要知道黑白总页数和带有某些颜色(图像或彩色文本)的总页数。
感谢您的任何想法!
更多信息#1: 我们期望主要是简单的“文字”,例如带有一些图像和一些彩色文本元素/框的创建的 PDF。在此过程中不需要完整的扫描页面。
Does someone knows a workable solution for the following:
A PDF file needs to be checked if it contains colored pages. Need to know total pages in black/white and total pages with some colors on it (images or colored text).
Thanks for any ideas!
More info #1:
We expect mainly plain "word" like created PDFs with some images and some colored text elements/boxes. Full scanned pages are not expected in this process.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
有关基于 Ghostscript 的工具,请参阅此答案:
它使用新的
inkcov
设备可确定每个页面的 C(青色)、Y(黄色)、M(品红色)和 K(黑色)分量(墨水覆盖率)的分布。您需要 Ghostscript 9.05 或更高版本。命令行示例:
仅 C、M 和 Y 为零的每个页面将仅为黑/白。
See this answer for a Ghostscript-based tool:
It uses the new
inkcov
device to determine the distribution of C (cyan), Y (yellow), M (magenta) and K (black) components (ink coverage) of each page. You'll need a Ghostscript version of 9.05 or newer.Example command line:
Each page with zeros only for C, M and Y will be black/white only.
最简单的方法可能是使用工具将 PDF 渲染为一组图像,然后使用一个小程序来确定这些图像中使用的颜色是否仅为灰度。
第二步可以通过加载每个图像并扫描像素来执行。对于扫描页面:确定某些内容是否是灰度并不是微不足道的,因为您需要考虑每页的白点、黑点以及可能的边缘颜色等。我曾经创建了一个工具来确定某些内容是否只是文本或黑白线稿通过获取 Abs(R-G) 和 Abs(R-B) 的二维直方图,绘制一条直线并检查该线和回归常数是否在某些预定义的范围内。
Probably the easiest way to do that is to use a tool to render the PDF to a set of images and then use a small program to determine if the colors used in those images are grayscale only or not.
The second step can be performed by loading each and every image and scanning the pixels. For scanned pages: determining if something is grayscale is not trivial since you need to consider the whitepoint, blackpoint for each page and possibly lighting coloring of edges etc etc. I once created a tool te determine if something is just text or b/w lineart by obtaining the the 2D historgram of Abs( R- G ) and Abs( R - B ), plotting a straight line and check if that line and the regression constant where within some predefined ranges.