使用贝叶斯类进行数字识别
我需要编写一个仅用于数字的 OCR 程序。我将使用 MNIST 数据集。问题是我不知道从哪里开始。有很多论文并没有真正解释算法。我对模式识别真的了解不多。所以我有几个问题。
Q1:在哪里可以找到算法(或教程) Q2:如何对数字进行分类?我不需要很先进的东西。我首先想到的是找到上半部/下半部和左侧/右侧的比率。有没有更有用、更简单的分类方法。 Q3:什么是反向传播以及大多数论文中显示的层。我的简单 OCR 需要它们吗?
注意:我知道我的 OCR 程序不准确。目前还不是很重要。
I need to write an OCR program for digits only. I will use MNIST datasets. The problem is I do not know where to start. There are a lot of papers which doesn't really explain the algorithm. I don't really have much knowledge about pattern recognition. So I have a few questions.
Q1 : Where can I find the algorithm (or a tutorial)
Q2 : How do I classify digits? I don't need very advanced things. First thing that comes to my mind is finding the ratio of upper half/lower half and left side/ right side. Is there more useful and easy classification methods.
Q3 : What is back propagation and the layers which is shown in most of the papers. Do I need them for my simple OCR.
Note: I know my OCR program won't be accurate. It isn't very important for now.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果离您最近的工程图书馆有关于图像处理、计算机视觉或机器视觉的部分,那么幸运的是,该图书馆将有我推荐的 OCR 书籍的副本:
Cheriet 的字符识别系统 、Kharma、Liu 和 Suen
本书对 OCR 技术和最新研究进行了相当全面的概述。它没有深入探讨任何特定主题,但确实提供了学术论文的参考。
确保您可以获得一本好的图像处理入门教科书。 Gonzalez 和 Woods 所著的书是许多大学的标准:
Gonzalez 和 Woods 所著的数字图像处理
即使“简单”的 OCR 也会很快变得棘手。如果您在牢牢掌握基本图像处理原理之前就跳入有关神经网络、贝叶斯定理等的课程,可能会感到不知所措。
如果可以,请先尝试为机器打印字符编写一种或多种 OCR 算法,然后再尝试为手写字符编写算法。
Q1:在哪里可以找到算法(或教程)
OCR 有很多算法。 Cheriet 书将为您提供一个良好的开端。
问题2:如何对数字进行分类?我不需要很先进的东西。我首先想到的是找到上半部/下半部和左侧/右侧的比率。有没有更有用、更简单的分类方法。
尝试实施该技术,看看它的效果如何。即使实施效果不如您所愿,实施过程中吸取的经验教训也可以在以后为您提供帮助。
您还可以将字符细分为 2 x 2 网格或 3 x 3 网格,并检查像素的相对密度。与机器打印的字符不同,手写字符在直线网格中无法很好地排列。
使用归一化相关性的模板匹配很简单,并且对于单一已知字体的机器打印字符可以相当好地工作。实现起来比较简单,值得学习:
http://en.wikipedia.org/wiki/Cross-correlation#Normalized_cross-correlation
对于 OCR,通常第一步是细化样本中的字符。细化是一种将字符(或任何其他形状)缩小为 1 像素宽的表示形式的技术。一旦你有了细化的字符,就可以更容易地识别线条和交叉点。如果您可以识别直线(或曲线)和相交,那么一种技术是查看每条线相对于其他线的相对位置和角度。
常见的细化算法包括Stentiford和Zhang-Suen。 WinTopo 的免费软件版本演示了这两种算法:
http://wintopo.com/
您可以查看有关“笔画提取”的学术论文,但这些技术往往是实施起来更加困难。
Q3:什么是反向传播以及大多数论文中显示的层。我的简单 OCR 需要它们吗?
这些术语指的是人工神经网络。对于简单的 OCR 算法,您将硬编码识别逻辑或使用简单的训练方法。可以训练人工神经网络来识别软件中未硬编码的字符。
http://en.wikipedia.org/wiki/Neural_network
虽然你不需要学习关于人工神经网络编写一个简单的 OCR 算法,一个简单的算法对于手写字符只能取得有限的成功。
最重要的是,请记住,手写字符的 OCR 是一个极其困难的问题。如果您可以通过简单的技术实现 20% 的手写字符读取率,那么就认为这是成功的。
If the closest engineering library to you has a section on image processing, computer vision, or machine vision, then with luck that library will have a copy of a book I recommend for OCR:
Character Recognition Systems by Cheriet, Kharma, Liu, and Suen
This book provides a fairly comprehensive overview of OCR techniques and recent research. It does not go into great depth on any particular subject, but it does provide references to academic papers.
Make sure you have access to a good introductory textbook on image processing. The book by Gonzalez and Woods is a standard in many universities:
Digital Image Processing by Gonzalez and Woods
Even "simple" OCR gets tricky very quickly. It could be overwhelming if you jump into a class about neural networks, Bayes theorem, etc., before you have a firm grasp of basic image processing principles.
If you can, try writing one or more OCR algorithms for machine-printed characters before you attempt to write an algorithm for handwritten characters.
Q1 : Where can I find the algorithm (or a tutorial)
There are numerous algorithms for OCR. The Cheriet book will give you a good start.
Q2 : How do I classify digits? I don't need very advanced things. First thing that comes to my mind is finding the ratio of upper half/lower half and left side/ right side. Is there more useful and easy classification methods.
Try implementing that technique and see how well it works. Even if the implementation doesn't work as well as you'd like, lessons learned while implementing it could help you later.
You can also subdivide a character into a 2 x 2 grid or 3 x 3 grid and check for relatively densities of pixels. Unlike machine printed characters, handwritten characters won't line up nicely in rectilinear grids.
Template matching using normalized correlation is simple, and it can work reasonably well for machine printed characters for a single, known font. It's relatively simple to implement and worth learning:
http://en.wikipedia.org/wiki/Cross-correlation#Normalized_cross-correlation
For OCR it's common to thin the characters in your sample as an initial step. Thinning is a technique to reduce a character (or any other shape) to a representation that is 1 pixel wide. Once you have a thinned character it can be easier to identify lines and intersections. If you can identify lines (or curves) and intesections, then one technique is to look at the relative position and angle of each line with respect to the others.
Common thinning algorithms include Stentiford and Zhang-Suen. There's a freeware version of WinTopo that demonstrates both of these algorithms:
http://wintopo.com/
You can look into academic papers about "stroke extraction", but those techniques tend to be more difficult to implement.
Q3 : What is back propagation and the layers which is shown in most of the papers. Do I need them for my simple OCR.
These terms refer to artificial neural networks. For a simple OCR algorithm you'll hard-code the recognition logic OR use simple training methods. Artificial neural networks can be trained to recognize characters that aren't hard-coded in your software.
http://en.wikipedia.org/wiki/Neural_network
Although you don't need to learn about artificial neural network to write a simple OCR algorithm, a simple algorithm will have only limited success with handwritten characters.
Above all, keep in mind that OCR for handwritten characters is an extremely difficult problem. If you could achieve a handwritten character read rate of 20% with a simple technique, then consider that a success.