Looking at the PDF Specs, I would say you have quite challenge in front of you:
PDF provides five types of graphics objects:
A path object is an arbitrary shape made up of straight lines, rectangles, and cubic Bézier curves. A path may intersect itself and may have disconnected sections and holes. A path object ends with one or more painting operators that specify whether the path shall be stroked, filled, used as a clipping boundary, or some combination of these operations.
A text object ...
An external object (XObject) is an object defined outside the content stream and referenced as a named resource (see 7.8.3, "Resource Dictionaries"). The interpretation of an XObject depends on its type. ...
An inline image object uses a special syntax to express the data for a small image directly within the content stream.
A shading object describes a geometric shape whose colour is an arbitrary function of position within the shape.
Therefore, at a minimum, one would need to know whether the ovals you are interested in are paths or external objects or inline image objects or shading objects.
Then, you need an appropriate algorithm which can decide whether an object of that type is an oval. Then, you need to figure out what unfilled means. Then, you need to figure out how to fill them.
It seems unlikely to me that anyone would put in that much effort to give you a ready-made solution.
It may actually be simpler to render the PDF to a grayscale bitmap and use simple shape recognition to determine filled from unfilled ovals. If you can reliably determine where the ovals are going to be (I'm assuming this is coming from a form, so the position of the ovals would be standard), you can make a simple heuristic (e.g. if 70% of pixels are 50% gray or higher) to determine what kind of oval it is.
发布评论
评论(3)
$doc->traverse($dereference, $node, $callbackfunc, $callbackdata)
看起来很有前途。检查一下椭圆的类型。The
$doc->traverse($dereference, $node, $callbackfunc, $callbackdata)
seems pretty promising. Check and see what's the oval's type.看看 PDF 规范,我想说你在在你面前:
因此,至少,人们需要知道您感兴趣的椭圆是路径、外部对象、内联图像对象还是着色对象。
然后,您需要一个适当的算法来确定该类型的对象是否是椭圆形。然后,您需要弄清楚unfilled是什么意思。然后,您需要弄清楚如何填充它们。
在我看来,似乎不太可能有人会投入那么多精力来为您提供现成的解决方案。
Looking at the PDF Specs, I would say you have quite challenge in front of you:
Therefore, at a minimum, one would need to know whether the ovals you are interested in are paths or external objects or inline image objects or shading objects.
Then, you need an appropriate algorithm which can decide whether an object of that type is an oval. Then, you need to figure out what unfilled means. Then, you need to figure out how to fill them.
It seems unlikely to me that anyone would put in that much effort to give you a ready-made solution.
实际上,将 PDF 渲染为灰度位图并使用简单的形状识别来确定填充椭圆形和未填充椭圆形可能更简单。如果您可以可靠地确定椭圆形的位置(我假设这是来自表格,因此椭圆形的位置将是标准的),您可以进行简单的启发式计算(例如,如果 70% 的像素是 50 %灰色或更高)来确定它是哪种椭圆形。
例如,在这种情况下:
您可以使用网格分割椭圆:
然后从那里您只需循环网格,将简单的启发式应用到每个单元格。
It may actually be simpler to render the PDF to a grayscale bitmap and use simple shape recognition to determine filled from unfilled ovals. If you can reliably determine where the ovals are going to be (I'm assuming this is coming from a form, so the position of the ovals would be standard), you can make a simple heuristic (e.g. if 70% of pixels are 50% gray or higher) to determine what kind of oval it is.
For example in this situation:
You can split the ovals using a grid:
Then from there you just loop over the grid, applying that simple heuristic to each cell.