如何使用 CAM::PDF 识别 PDF 文档中未填充的椭圆?

发布于 2024-08-08 09:39:14 字数 1436 浏览 2 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

陌伤浅笑 2024-08-15 09:39:14

$doc->traverse($dereference, $node, $callbackfunc, $callbackdata) 看起来很有前途。检查一下椭圆的类型。

The $doc->traverse($dereference, $node, $callbackfunc, $callbackdata) seems pretty promising. Check and see what's the oval's type.

吾家有女初长成 2024-08-15 09:39:14

看看 PDF 规范,我想说你在在你面前:

PDF 提供五种类型的图形对象:

  • 路径对象是由直线、矩形和三次贝塞尔曲线组成的任意形状。路径可能会与自身相交,并且可能具有断开的部分和孔。路径对象以一个或多个绘画运算符结尾,这些运算符指定是否应对路径进行描边、填充、用作剪切边界或这些操作的某种组合。

  • 一个文本对象 ...

  • 外部对象 (XObject) 是在内容流外部定义并作为命名资源引用的对象(请参阅 7.8.3“资源字典”)。 XObject 的解释取决于它的类型。 ...

  • 内联图像对象使用特殊语法直接在内容流中表达小图像的数据。

  • 着色对象描述了一个几何形状,其颜色是形状内位置的任意函数。

因此,至少,人们需要知道您感兴趣的椭圆是路径、外部对象、内联图像对象还是着色对象。

然后,您需要一个适当的算法来确定该类型的对象是否是椭圆形。然后,您需要弄清楚unfilled是什么意思。然后,您需要弄清楚如何填充它们。

在我看来,似乎不太可能有人会投入那么多精力来为您提供现成的解决方案。

Looking at the PDF Specs, I would say you have quite challenge in front of you:

PDF provides five types of graphics objects:

  • A path object is an arbitrary shape made up of straight lines, rectangles, and cubic Bézier curves. A path may intersect itself and may have disconnected sections and holes. A path object ends with one or more painting operators that specify whether the path shall be stroked, filled, used as a clipping boundary, or some combination of these operations.

  • A text object ...

  • An external object (XObject) is an object defined outside the content stream and referenced as a named resource (see 7.8.3, "Resource Dictionaries"). The interpretation of an XObject depends on its type. ...

  • An inline image object uses a special syntax to express the data for a small image directly within the content stream.

  • A shading object describes a geometric shape whose colour is an arbitrary function of position within the shape.

Therefore, at a minimum, one would need to know whether the ovals you are interested in are paths or external objects or inline image objects or shading objects.

Then, you need an appropriate algorithm which can decide whether an object of that type is an oval. Then, you need to figure out what unfilled means. Then, you need to figure out how to fill them.

It seems unlikely to me that anyone would put in that much effort to give you a ready-made solution.

や莫失莫忘 2024-08-15 09:39:14

实际上,将 PDF 渲染为灰度位图并使用简单的形状识别来确定填充椭圆形和未填充椭圆形可能更简单。如果您可以可靠地确定椭圆形的位置(我假设这是来自表格,因此椭圆形的位置将是标准的),您可以进行简单的启发式计算(例如,如果 70% 的像素是 50 %灰色或更高)来确定它是哪种椭圆形。

例如,在这种情况下:

[ ]        [ ]         [ ]       [X]

[ ]        [X]         [ ]       [ ]

[ ]        [ ]         [X]       [ ]

您可以使用网格分割椭圆:

[ ]   |    [ ]    |    [ ]   |   [X]
------+-----------+----------+------
[ ]   |    [X]    |    [ ]   |   [ ]
------+-----------+----------+------
[ ]   |    [ ]    |    [X]   |   [ ]

然后从那里您只需循环网格,将简单的启发式应用到每个单元格。

It may actually be simpler to render the PDF to a grayscale bitmap and use simple shape recognition to determine filled from unfilled ovals. If you can reliably determine where the ovals are going to be (I'm assuming this is coming from a form, so the position of the ovals would be standard), you can make a simple heuristic (e.g. if 70% of pixels are 50% gray or higher) to determine what kind of oval it is.

For example in this situation:

[ ]        [ ]         [ ]       [X]

[ ]        [X]         [ ]       [ ]

[ ]        [ ]         [X]       [ ]

You can split the ovals using a grid:

[ ]   |    [ ]    |    [ ]   |   [X]
------+-----------+----------+------
[ ]   |    [X]    |    [ ]   |   [ ]
------+-----------+----------+------
[ ]   |    [ ]    |    [X]   |   [ ]

Then from there you just loop over the grid, applying that simple heuristic to each cell.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文