灰度图像中的字符匹配
我制作了图案:带有不同尺寸的“A”字母的图像(从12到72:12、14、..、72) 我测试了模式匹配的方法,得到了很好的结果。 从图像中选择文本区域的一种方法是对所有大小不同的字母和数字运行该算法。还有字体! 我不喜欢它。相反,我想做一些像通用模式或 更好的说法是:使用不同的窗口大小扫描图像并选择某些函数(该窗口处存在字符的概率)大于某个固定值的区域。 您知道实现该功能的任何方法或想法吗? 它必须适用于原始图像(灰度)。
I made patterns: images with the "A" letter of different sizes (from 12 to 72: 12, 14, .., 72)
And I tested the method of pattern matching and it gave a good results.
One way to select text regions from image is to run that algorithm for all small and big letters and digits of different sizes. And fonts!
I don't like it. Instead of it I want to make something like a universal pattern or
better to say: scanning image with different window sizes and select those regions where some function (probability of that there is a character at that window) is more than some fixed value.
Do you know any methods or ideas to make that function?
It must work with original image (grayscale).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我猜你正在开发OCR,对吧?
你决定采取一种非常不寻常的方式,因为其他人都在双色调图像上进行匹配。这使得一切变得更加简单。一旦你正确地降级了它(这本身就是非常困难的任务),你就不必处理不同的亮度级别,关心不均匀的背景等。当然,需要更少的计算资源。然而,以灰度进行所有操作实际上是您的目标,并且您想向其他 OCR 科学家展示它实际上是可行的 - 好吧,祝您好运。
您所描述的字母位置方法非常非常非常需要大量计算。您必须扫描整个图像(image_size^2),然后与模式(*pattern_size^2)匹配,然后对每个模式(*pattern_num)进行匹配。这将非常慢。
相反,尝试简化你的算法,将其分为两个阶段。首先应该寻找图片上的一些特征(例如连接的黑暗区域,或在大方块上分割图像并丢弃所有浅色区域),然后才对少量找到的区域进行模式匹配。这至少是 N^2,您可以尝试降低首先处理图像的行或列的复杂性(通过创建直方图)。因此,您可以尝试使用许多不同的简化方法。
当你在图片上找到这些对象并匹配它们上的图案后,你实际上知道它们的大小,所以你不必存储所有大小的字母 A,你只需将对象的原始图像重新缩放到大小,例如 72,并匹配它。
至于字体 - 你在这里实际上没有太多选择,你需要匹配 A 的所有可能形状以确保找到 A。但是一旦你只匹配 A 的一种大小 - 你就有更多的计算能力来尝试不同的A。
I suppose you are developing OCR, right?
You decided to go quite unusual way since everyone else do matching on bi-tonal images. This makes everything much simplier. Once you degradated it properly (which is very difficult task by itself), you do not have to deal with different brightness levels, take care about uneven background, etc. And sure, less computation resources needed. However, is doing everything in grayscale is actually your goal and you want to show other OCR scientists that it is actually doable - well, I wish you good luck then.
Approach of letters location you described is very-very-very computation intesive. You have to scan whole image (image_size^2), then match with pattern ( * pattern_size^2) and then do it for each pattens ( * pattern_num ). This will be incredibly slow.
Instead try to simplify your algorithm to break it to two stages. First should look for some features on picture (like connected dark regions, or split image on large squares and throw away all light ones) and only then perform pattern matching on small number of found areas. This is all at least N^2, and you could try to reduce complexity to working on rows or columns of image first (by creating histogram). So there is a lot of different simplification methods you can try to play with.
After you have located those objects on picture and going to match patterns on them, you actually know their size, so you don't have to store letter A in all sizes, you can just rescale original image of object to the size say 72, and match it.
As to fonts - you don't really have much choice here, you will need to match against all possible shapes of A to make sure you found A. But once you match against just one size of A - you have more computing power to try different A's.