如何使用 ImageMagick 检测水平黑线?

发布于 2024-08-28 21:52:27 字数 218 浏览 3 评论 0原文

所以我有一个 TIFF 格式的电子表格。它有一些一致性......例如,所有列的宽度都是相同的。我想通过那些已知的列宽度来限制这张表,并基本上创建大量的小图形文件,每个单元格一个,并对它们运行 OCR 并将其存储到数据库中。问题是水平线的高度并不相同,因此我需要使用某种图形库命令来检查每个像素是否具有相同的颜色(即黑色)。如果是这样,那么我知道我已经达到了单元格的高度分隔符。我该怎么做呢? (我使用的是RMagick)

So I have what is essentially a spreadsheet in TIFF format. There is some uniformity to it...for example, all the column widths are the same. I want to de-limit this sheet by those known-column widths and basically create lots of little graphic files, one for each cell, and run OCR on them and store it into a database. The problem is that the horizontal lines are not all the same height, so I need to use some kind of graphics library command to check if every pixel across is the same color (i.e. black). And if so, then I know I've reached the height-delimiter for a cell. How would I go about doing that? (I'm using RMagick)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

几度春秋 2024-09-04 21:52:27

使用image#get_pixelhttp://www. simplesystems.org/RMagick/doc/image2.html#get_pixels
警告:这些文档很旧,因此在新版本中可能已更改。使用 $ gem server 查看您自己的 rdocs,假设它们有 rdocs。

image#rows 为您提供图像的高度,然后您可以执行类似的操作(未经测试):

def black_line?(pixels)
  pixels.each do |pixel| 
    unless pixel.red == 0 && pixel.green == 0 && pixel.blue == 0
      return false
    end
  end
  true
end 

black_line_heights = []
height = image.rows
width = image.columns
height.times do |y|
  pixels = image.get_pixel(0,y,width,1)
  black_line_heights << y if black_line?(pixels)
end 

请记住,我不确定 api。看看旧的文档,我现在无法测试它。但这看起来像是您会采取的一般方法。顺便说一句,它假设行边框为 1 像素厚。如果没有,请将 1 更改为实际厚度,这可能足以使其按照您的预期工作。

Use image#get_pixel: http://www.simplesystems.org/RMagick/doc/image2.html#get_pixels
Warning: Those docs are old, so it may have changed in the newer versions. Look at your own rdocs using $ gem server, assuming they have rdocs.

image#rows gives you the height of the image, then you can do something like (untested):

def black_line?(pixels)
  pixels.each do |pixel| 
    unless pixel.red == 0 && pixel.green == 0 && pixel.blue == 0
      return false
    end
  end
  true
end 

black_line_heights = []
height = image.rows
width = image.columns
height.times do |y|
  pixels = image.get_pixel(0,y,width,1)
  black_line_heights << y if black_line?(pixels)
end 

Please keep in mind that I'm not sure about the api. Looking at older docs, and I can't test it now. But it looks like the general approach you would take. BTW, it assumes the row borders are 1 pixel thick. If not, change the 1 to the actual thickness and that might be enough to make it work like you expect.

路弥 2024-09-04 21:52:27

Ehsanul 的说法几乎是正确的……调用的是 get_pixels,它接收 x、y、w、h 作为参数并返回这些像素的数组。如果维度为 1 厚,您将得到一个很好的一维数组。

由于文档中的黑色可能会有所不同,因此我稍微改变了 Ehsanul 的方法来检测连续像素是否具有大致相同的颜色。 100 左右像素后,可能是一条线:

  def solid_line?(pixels, opt={}, black_val = 10)
    last_pixel = nil
     thresh =  opt[:threshold].blank? ? 4 : opt[:threshold]

     pixels.each do |pix|     
       pixel = [pix.red, pix.green, pix.blue]
       if last_pixel != nil            
         return false if pixel.reject{|p| (p-last_pixel[pixel.index(p)]).abs < thresh && p < black_val}.length > 0
       end
       last_pixel = pixel
     end
     true


    end

Ehsanul had it almost right...the call is get_pixels, which takes in as arguments x,y,w,h and returns an array of those pixels. If the dimension is 1 thick, you'll get a nice one-d array.

Since the black in a document can vary, I altered Ehsanul's method a little bit to detect whether consecutive pixels were roughly the same color. AFter a 100 or so pixels, it's probably a line:

  def solid_line?(pixels, opt={}, black_val = 10)
    last_pixel = nil
     thresh =  opt[:threshold].blank? ? 4 : opt[:threshold]

     pixels.each do |pix|     
       pixel = [pix.red, pix.green, pix.blue]
       if last_pixel != nil            
         return false if pixel.reject{|p| (p-last_pixel[pixel.index(p)]).abs < thresh && p < black_val}.length > 0
       end
       last_pixel = pixel
     end
     true


    end
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文