我自己的 Python OCR 程序
我还是一个初学者,但我想写一个字符识别程序。这个程序还没有准备好。而且我编辑了很多,所以评论可能不完全一致。我将使用 8 个连通性来标记连通分量。
from PIL import Image
import numpy as np
im = Image.open("D:\\Python26\\PYTHON-PROGRAMME\\bild_schrift.jpg")
w,h = im.size
w = int(w)
h = int(h)
#2D-Array for area
area = []
for x in range(w):
area.append([])
for y in range(h):
area[x].append(2) #number 0 is white, number 1 is black
#2D-Array for letter
letter = []
for x in range(50):
letter.append([])
for y in range(50):
letter[x].append(0)
#2D-Array for label
label = []
for x in range(50):
label.append([])
for y in range(50):
label[x].append(0)
#image to number conversion
pix = im.load()
threshold = 200
for x in range(w):
for y in range(h):
aaa = pix[x, y]
bbb = aaa[0] + aaa[1] + aaa[2] #total value
if bbb<=threshold:
area[x][y] = 1
if bbb>threshold:
area[x][y] = 0
np.set_printoptions(threshold='nan', linewidth=10)
#matrix transponation
ccc = np.array(area)
area = ccc.T #better solution?
#find all black pixel and set temporary label numbers
i=1
for x in range(40): # width (later)
for y in range(40): # heigth (later)
if area[x][y]==1:
letter[x][y]=1
label[x][y]=i
i += 1
#connected components labeling
for x in range(40): # width (later)
for y in range(40): # heigth (later)
if area[x][y]==1:
label[x][y]=i
#if pixel has neighbour:
if area[x][y+1]==1:
#pixel and neighbour get the lowest label
pass # tomorrows work
if area[x+1][y]==1:
#pixel and neighbour get the lowest label
pass # tomorrows work
#should i also compare pixel and left neighbour?
#find width of the letter
#find height of the letter
#find the middle of the letter
#middle = [width/2][height/2] #?
#divide letter into 30 parts --> 5 x 6 array
#model letter
#letter A-Z, a-z, 0-9 (maybe more)
#compare each of the 30 parts of the letter with all model letters
#make a weighting
#print(letter)
im.save("D:\\Python26\\PYTHON-PROGRAMME\\bild2.jpg")
print('done')
I am still a beginner but I want to write a character-recognition-program. This program isn't ready yet. And I edited a lot, therefor the comments may not match exactly. I will use the 8-connectivity for the connected component labeling.
from PIL import Image
import numpy as np
im = Image.open("D:\\Python26\\PYTHON-PROGRAMME\\bild_schrift.jpg")
w,h = im.size
w = int(w)
h = int(h)
#2D-Array for area
area = []
for x in range(w):
area.append([])
for y in range(h):
area[x].append(2) #number 0 is white, number 1 is black
#2D-Array for letter
letter = []
for x in range(50):
letter.append([])
for y in range(50):
letter[x].append(0)
#2D-Array for label
label = []
for x in range(50):
label.append([])
for y in range(50):
label[x].append(0)
#image to number conversion
pix = im.load()
threshold = 200
for x in range(w):
for y in range(h):
aaa = pix[x, y]
bbb = aaa[0] + aaa[1] + aaa[2] #total value
if bbb<=threshold:
area[x][y] = 1
if bbb>threshold:
area[x][y] = 0
np.set_printoptions(threshold='nan', linewidth=10)
#matrix transponation
ccc = np.array(area)
area = ccc.T #better solution?
#find all black pixel and set temporary label numbers
i=1
for x in range(40): # width (later)
for y in range(40): # heigth (later)
if area[x][y]==1:
letter[x][y]=1
label[x][y]=i
i += 1
#connected components labeling
for x in range(40): # width (later)
for y in range(40): # heigth (later)
if area[x][y]==1:
label[x][y]=i
#if pixel has neighbour:
if area[x][y+1]==1:
#pixel and neighbour get the lowest label
pass # tomorrows work
if area[x+1][y]==1:
#pixel and neighbour get the lowest label
pass # tomorrows work
#should i also compare pixel and left neighbour?
#find width of the letter
#find height of the letter
#find the middle of the letter
#middle = [width/2][height/2] #?
#divide letter into 30 parts --> 5 x 6 array
#model letter
#letter A-Z, a-z, 0-9 (maybe more)
#compare each of the 30 parts of the letter with all model letters
#make a weighting
#print(letter)
im.save("D:\\Python26\\PYTHON-PROGRAMME\\bild2.jpg")
print('done')
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
OCR 确实不是一件容易的事。这就是文本验证码仍然有效的原因:)
仅讨论字母提取而不是模式识别,用于分隔字母的技术称为 连接组件标签。由于您需要一种更有效的方法来执行此操作,因此请尝试实现本文中描述的两遍算法。另一种描述可以在文章Blob 提取中找到。
编辑:这是我建议的算法的实现:
它不是 100% 完美,但由于您这样做只是为了学习目的,所以它可能是一个很好的起点。通过每个角色的边界框,您现在可以使用神经网络,正如其他人在这里建议的那样。
OCR is not an easy task indeed. That's why text CAPTCHAs still work :)
To talk only about the letter extraction and not the pattern recognition, the technique you are using to separate the letters is called Connected Component Labeling. Since you are asking for a more efficient way to do this, try to implement the two-pass algorithm that's described in this article. Another description can be found in the article Blob extraction.
EDIT: Here's the implementation for the algorithm that I have suggested:
It's not 100% perfect, but since you are doing this only for learning purposes, it may be a good starting point. With the bounding box of each character you can now use a neural network as others have suggested here.
OCR 非常非常难。即使是计算机生成的字符,如果您事先不知道字体和字体大小,这也是相当具有挑战性的。即使您完全匹配字符,我也不会称其为“开始”编程项目;这是相当微妙的。
如果您想识别扫描或手写字符,那就更难了 - 您需要使用高级数学、算法和机器学习。关于这个主题有很多书和数千篇文章,所以你不需要重新发明轮子。
我钦佩你的努力,但我认为你还没有走得足够远,无法解决任何实际的困难。到目前为止,您只是随机探索像素并将它们从一个数组复制到另一个数组。您实际上还没有进行任何比较,我不确定您的“随机游走”的目的。
当你进行比较时,你必须面对这样一个事实:图像与“原型”并不完全相同,并且不清楚你将如何处理这个问题。
不过,根据您到目前为止编写的代码,我有一个想法给您:尝试编写一个程序来穿过图像中的“迷宫”。输入将是图像,加上起始像素和目标像素。输出是从起点到目标穿过迷宫的路径。这是一个比 OCR 更容易的问题 - 解决迷宫是计算机擅长的事情 - 但它仍然很有趣且具有挑战性。
OCR is very, very hard. Even with computer-generated characters, it's quite challenging if you don't know the font and font size in advance. Even if you're matching characters exactly, I would not call it a "beginning" programming project; it's quite subtle.
If you want to recognize scanned, or handwritten characters, that's even harder - you'll need to use advanced math, algorithms, and machine learning. There are quite a few books and thousands of articles written about this topic, so you don't need to reinvent the wheel.
I admire your effort, but I don't think you've gotten far enough to hit any of the actual difficulties yet. So far you're just randomly exploring pixels and copying them from one array to another. You haven't actually done any comparison yet, and I'm not sure the purpose of your "random walk".
When you get the comparison, you'll have to deal with the fact that the image is not exactly the same as the "prototype", and it's not clear how you'll deal with that.
Based on the code you've written so far, though, I have an idea for you: try writing a program that finds its way through a "maze" in an image. The input would be the image, plus the start pixel and the goal pixel. The output is a path through the maze from the start to the goal. This is a much easier problem than OCR - solving mazes is something that computers are great for - but it's still fun and challenging.
如今大多数 OCR 算法都基于神经网络算法。 Hopfield 网络是一个很好的起点。基于可用的 Hopfield 模型 C 语言,我构建了一个非常python中的基本图像识别算法与您所描述的类似。我已在此处发布了完整的源代码。这是一个玩具项目,不适合真正的 OCR,但可以让您朝着正确的方向开始。
可以找到一个带有示例的 Java 小程序此处;该网络使用数字 0-9 的示例输入进行训练。在右边的框中画出来,点击测试,从网络上看结果。
不要让数学符号吓倒您,一旦您获得源代码,算法就会很简单。
Most OCR algorithms these days are based on neural network algorithms. Hopfield networks are a good place to start. Based on the Hopfield Model available here in C, I built a very basic image recognition algorithm in python similar to what you describe. I've posted the full source here. It's a toy project and not suitable for real OCR, but can get you started in the right direction.
A Java applet to toy with an example can be found here; the network is trained with example inputs for the digits 0-9. Draw in the box on the right, click test and see the results from the network.
Don't let the mathematical notation intimidate you, the algorithms are straightforward once you get to source code.
OCR 非常非常困难!使用什么方法来尝试 OCR 将取决于您想要完成的任务(手写识别、计算机生成的文本阅读等)。
但是,为了让您开始,请阅读神经网络和 OCR。以下是一些关于该主题的直接跳转文章:
http://www. codeproject.com/KB/cs/neural_network_ocr.aspx
http://www .codeproject.com/KB/dotnet/simple_ocr.aspx
使用您最喜欢的搜索引擎查找信息。
玩得开心!
OCR is very, very difficult! What approach to use to attempt OCR will be based on what you are trying to accomplish (hand writing recongnition, computer generated text reading, etc.)
However, to get you started, read up on Neural Networks and OCR. Here are a few jump-right-in articles on the subject:
http://www.codeproject.com/KB/cs/neural_network_ocr.aspx
http://www.codeproject.com/KB/dotnet/simple_ocr.aspx
Use your favorite search engine to find information.
Have fun!