如何使用 ImageMagick 检测水平黑线?
所以我有一个 TIFF 格式的电子表格。它有一些一致性......例如,所有列的宽度都是相同的。我想通过那些已知的列宽度来限制这张表,并基本上创建大量的小图形文件,每个单元格一个,并对它们运行 OCR 并将其存储到数据库中。问题是水平线的高度并不相同,因此我需要使用某种图形库命令来检查每个像素是否具有相同的颜色(即黑色)。如果是这样,那么我知道我已经达到了单元格的高度分隔符。我该怎么做呢? (我使用的是RMagick)
So I have what is essentially a spreadsheet in TIFF format. There is some uniformity to it...for example, all the column widths are the same. I want to de-limit this sheet by those known-column widths and basically create lots of little graphic files, one for each cell, and run OCR on them and store it into a database. The problem is that the horizontal lines are not all the same height, so I need to use some kind of graphics library command to check if every pixel across is the same color (i.e. black). And if so, then I know I've reached the height-delimiter for a cell. How would I go about doing that? (I'm using RMagick)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
使用
image#get_pixel
:http://www. simplesystems.org/RMagick/doc/image2.html#get_pixels警告:这些文档很旧,因此在新版本中可能已更改。使用
$ gem server
查看您自己的 rdocs,假设它们有 rdocs。image#rows
为您提供图像的高度,然后您可以执行类似的操作(未经测试):请记住,我不确定 api。看看旧的文档,我现在无法测试它。但这看起来像是您会采取的一般方法。顺便说一句,它假设行边框为 1 像素厚。如果没有,请将
1
更改为实际厚度,这可能足以使其按照您的预期工作。Use
image#get_pixel
: http://www.simplesystems.org/RMagick/doc/image2.html#get_pixelsWarning: Those docs are old, so it may have changed in the newer versions. Look at your own rdocs using
$ gem server
, assuming they have rdocs.image#rows
gives you the height of the image, then you can do something like (untested):Please keep in mind that I'm not sure about the api. Looking at older docs, and I can't test it now. But it looks like the general approach you would take. BTW, it assumes the row borders are 1 pixel thick. If not, change the
1
to the actual thickness and that might be enough to make it work like you expect.Ehsanul 的说法几乎是正确的……调用的是 get_pixels,它接收 x、y、w、h 作为参数并返回这些像素的数组。如果维度为 1 厚,您将得到一个很好的一维数组。
由于文档中的黑色可能会有所不同,因此我稍微改变了 Ehsanul 的方法来检测连续像素是否具有大致相同的颜色。 100 左右像素后,可能是一条线:
Ehsanul had it almost right...the call is get_pixels, which takes in as arguments x,y,w,h and returns an array of those pixels. If the dimension is 1 thick, you'll get a nice one-d array.
Since the black in a document can vary, I altered Ehsanul's method a little bit to detect whether consecutive pixels were roughly the same color. AFter a 100 or so pixels, it's probably a line: