当前位置：文江博客话题详情

如何识别图像中的字母？（OCR 识别前）

发布于 2024-10-15 17:42:53 字数 54 浏览 4 评论 0原文

我在网上能找到的都是关于 OCR 的，但我还没有做到这一点，我仍然需要识别字母在图像中的位置。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

北陌 2024-10-22 17:42:53

有趣的是，答案并不像看起来那么简单。有些人可能认为定位图片上的字符是 OCR 的第一步，但事实并非如此。事实上，在你真正完成识别之前，你不会确定每个字符的位置。

它的工作方式完全取决于您要识别的图像类型。首先，您应该在文本区域（块）和其他所有区域上分割图像。

仅举几个例子：

如果您要识别汽车图片上的车牌，您应该首先找到车牌，然后才将其拆分为单独的字符。
如果您正在识别某些申请表，您只需知道其布局就可以找到文本所在的区域。
如果您正在识别书页的扫描件，您必须区分图片和文本区域，然后仅处理文本。

从这一刻开始，您不再需要原始图像，您需要的只是文本块的二值化图像。所有 OCR 算法都适用于二进制图像。您可能还需要进行其他类型的图像转换，例如直线矫直、透视校正、倾斜校正等 - 所有这些又取决于您正在识别的图像类型。

找到文本块并对其进行规范化后，您应该进一步查找文本块上的文本行。在文本水平线的简单情况下，通过水平线创建像素直方图非常简单。

现在，当你有了台词时，你可能会认为现在很简单，你可以把它拆分成字符，呵呵！再说一次，这是错误的。存在诸如连接字符、断开字符甚至连字（两个字母形成一个形状）或字母的部分进一步移至下一个字符的右上方或下方等现象。您应该做的是创建几个将线条分割为单词和单个字符的假设，然后尝试 OCR 每个变体，用置信度衡量每个假设。最后一步是使用字典检查该图中的不同路径并选择最佳路径。

直到现在，当你真正认识到一切时，你才能说出各个字符的位置。

因此，简单的答案是：使用 OCR 程序识别图像，并从其输出中获取字符坐标。

The interesting thing is that the answer is not that simple as it may seem. Some may think that locating characters on the picture is first step of OCR, but it is not the case. Actually, you won't be sure where each character is located until you actually finish with recognizing.

The way it works completely depends on the type of image you are going to recognize. First you should segment you image on text areas (blocks) and everything other.

Just few examples:

If you are recognizing license plate on car picture, you should first locate license plate, and only then split it to separate characters.
If you are recognizing some application form, you can locate areas where text is just by knowing it's layout
If you are recognizing scan of book page, you have to distinguish pictures from text areas and then work only on text.

Starting from this moment you don't need original image any more, all you need is binarized image of text block. All OCR alorithms work on binary images. You may need also doing other kind of image transformations like line straightening, perspective correction, skew correction and so on - all that again depends on type of images you are recognizing.

Once text block is found and normalized, you should go further and find lines of text on the text block. In trivial case of horisontal lines of text it is quite simple by creating pixel histogram by horisontal lines.

Now, when you have lines, you may think that now it is simple, you can split it to characters, huray! Again, it is wrong. There are such phenomena as connected characters, broken characters and even ligatures (two letters forming one single shape), or letter that have their parts go further to the right above or bellow next character. What you should do is to create several hipotesis of splitting line to words and individual characters, then try OCR every single variant, weight every hypotesis with confidence level. Last step would be checking different paths in this graph using dictionary and selecting best one.

And only now, when you actually recognized everything, you can say where individual characters are located.

So, simple answer is: recognize your image with OCR program, and get coordinates of charaters from it's output.

回复收藏 0 原文