如何识别图像中的字母? (OCR 识别前)

发布于 2024-10-15 17:42:53 字数 54 浏览 4 评论 0原文

我在网上能找到的都是关于 OCR 的,但我还没有做到这一点,我仍然需要识别字母在图像中的位置。

All I can find in the web is about OCR but I'm not there yet, I still have to recognize where the letters are in the image.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

北陌 2024-10-22 17:42:53

有趣的是,答案并不像看起来那么简单。有些人可能认为定位图片上的字符是 OCR 的第一步,但事实并非如此。事实上,在你真正完成识别之前,你不会确定每个字符的位置。

它的工作方式完全取决于您要识别的图像类型。首先,您应该在文本区域(块)和其他所有区域上分割图像。

仅举几个例子:

  • 如果您要识别汽车图片上的车牌,您应该首先找到车牌,然后才将其拆分为单独的字符。
  • 如果您正在识别某些申请表,您只需知道其布局就可以找到文本所在的区域。
  • 如果您正在识别书页的扫描件,您必须区分图片和文本区域,然后仅处理文本。

从这一刻开始,您不再需要原始图像,您需要的只是文本块的二值化图像。所有 OCR 算法都适用于二进制图像。您可能还需要进行其他类型的图像转换,例如直线矫直、透视校正、倾斜校正等 - 所有这些又取决于您正在识别的图像类型。

找到文本块并对其进行规范化后,您应该进一步查找文本块上的文本行。在文本水平线的简单情况下,通过水平线创建像素直方图非常简单。

现在,当你有了台词时,你可能会认为现在很简单,你可以把它拆分成字符,呵呵!再说一次,这是错误的。存在诸如连接字符、断开字符甚至连字(两个字母形成一个形状)或字母的部分进一步移至下一个字符的右上方或下方等现象。您应该做的是创建几个将线条分割为单词和单个字符的假设,然后尝试 OCR 每个变体,用置信度衡量每个假设。最后一步是使用字典检查该图中的不同路径并选择最佳路径。

直到现在,当你真正认识到一切时,你才能说出各个字符的位置。

因此,简单的答案是:使用 OCR 程序识别图像,并从其输出中获取字符坐标。

The interesting thing is that the answer is not that simple as it may seem. Some may think that locating characters on the picture is first step of OCR, but it is not the case. Actually, you won't be sure where each character is located until you actually finish with recognizing.

The way it works completely depends on the type of image you are going to recognize. First you should segment you image on text areas (blocks) and everything other.

Just few examples:

  • If you are recognizing license plate on car picture, you should first locate license plate, and only then split it to separate characters.
  • If you are recognizing some application form, you can locate areas where text is just by knowing it's layout
  • If you are recognizing scan of book page, you have to distinguish pictures from text areas and then work only on text.

Starting from this moment you don't need original image any more, all you need is binarized image of text block. All OCR alorithms work on binary images. You may need also doing other kind of image transformations like line straightening, perspective correction, skew correction and so on - all that again depends on type of images you are recognizing.

Once text block is found and normalized, you should go further and find lines of text on the text block. In trivial case of horisontal lines of text it is quite simple by creating pixel histogram by horisontal lines.

Now, when you have lines, you may think that now it is simple, you can split it to characters, huray! Again, it is wrong. There are such phenomena as connected characters, broken characters and even ligatures (two letters forming one single shape), or letter that have their parts go further to the right above or bellow next character. What you should do is to create several hipotesis of splitting line to words and individual characters, then try OCR every single variant, weight every hypotesis with confidence level. Last step would be checking different paths in this graph using dictionary and selecting best one.

And only now, when you actually recognized everything, you can say where individual characters are located.

So, simple answer is: recognize your image with OCR program, and get coordinates of charaters from it's output.

口干舌燥 2024-10-22 17:42:53

一般来说,您会寻找接近纯色的小连续区域。我建议对每个像素进行采样并构建一个附近像素的数组,这些像素也落在原始像素颜色的阈值内(对每个匹配像素的邻居重复)。将整个数组作为潜在字符放在一边(或立即检查)并继续(可能会忽略以前收集的像素以加快速度)。

如果您提前知道文本的字体大小、质量和/或颜色,则可以进行优化。如果不是,您需要对构成“连续区域”的阈值相当慷慨。

Generally speaking you'll be looking for small contiguous areas of nearly solid color. I would suggest sampling each pixel and building an array of nearby pixels that also fall within a threshold of the original pixels color (repeat for neighbours of each matching pixel). Put the entire array aside as a potential character (or check it now) and move on (potentially ignoring previously collected pixels for a speedup).

Optimisations are possible if you know in advance the font-size, quality and/or color of the text. If not you'll want to be fairly generous with your thresholds of what constitutes a "contiguous area".

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文