我自己的 Python OCR 程序

发布于 2024-08-17 02:09:50 字数 2253 浏览 7 评论 0原文

我还是一个初学者,但我想写一个字符识别程序。这个程序还没有准备好。而且我编辑了很多,所以评论可能不完全一致。我将使用 8 个连通性来标记连通分量。

from PIL import Image
import numpy as np

im = Image.open("D:\\Python26\\PYTHON-PROGRAMME\\bild_schrift.jpg")

w,h = im.size
w = int(w)
h = int(h)

#2D-Array for area
area = []
for x in range(w):
    area.append([])
    for y in range(h):
        area[x].append(2) #number 0 is white, number 1 is black

#2D-Array for letter
letter = []
for x in range(50):
    letter.append([])
    for y in range(50):
        letter[x].append(0)

#2D-Array for label
label = []
for x in range(50):
    label.append([])
    for y in range(50):
        label[x].append(0)

#image to number conversion
pix = im.load()
threshold = 200
for x in range(w):
    for y in range(h):
        aaa = pix[x, y]
        bbb = aaa[0] + aaa[1] + aaa[2] #total value
        if bbb<=threshold:
            area[x][y] = 1
        if bbb>threshold:
            area[x][y] = 0
np.set_printoptions(threshold='nan', linewidth=10)

#matrix transponation
ccc = np.array(area) 
area = ccc.T #better solution?

#find all black pixel and set temporary label numbers
i=1
for x in range(40): # width (later)
    for y in range(40): # heigth (later)
        if area[x][y]==1:
            letter[x][y]=1
            label[x][y]=i
            i += 1

#connected components labeling
for x in range(40): # width (later)
    for y in range(40): # heigth (later)
        if area[x][y]==1:
            label[x][y]=i
            #if pixel has neighbour:
            if area[x][y+1]==1:
                #pixel and neighbour get the lowest label             
                pass # tomorrows work
            if area[x+1][y]==1:
                #pixel and neighbour get the lowest label             
                pass # tomorrows work            
            #should i also compare pixel and left neighbour?

#find width of the letter
#find height of the letter
#find the middle of the letter
#middle = [width/2][height/2] #?
#divide letter into 30 parts --> 5 x 6 array

#model letter
#letter A-Z, a-z, 0-9 (maybe more)

#compare each of the 30 parts of the letter with all model letters
#make a weighting

#print(letter)

im.save("D:\\Python26\\PYTHON-PROGRAMME\\bild2.jpg")
print('done')

I am still a beginner but I want to write a character-recognition-program. This program isn't ready yet. And I edited a lot, therefor the comments may not match exactly. I will use the 8-connectivity for the connected component labeling.

from PIL import Image
import numpy as np

im = Image.open("D:\\Python26\\PYTHON-PROGRAMME\\bild_schrift.jpg")

w,h = im.size
w = int(w)
h = int(h)

#2D-Array for area
area = []
for x in range(w):
    area.append([])
    for y in range(h):
        area[x].append(2) #number 0 is white, number 1 is black

#2D-Array for letter
letter = []
for x in range(50):
    letter.append([])
    for y in range(50):
        letter[x].append(0)

#2D-Array for label
label = []
for x in range(50):
    label.append([])
    for y in range(50):
        label[x].append(0)

#image to number conversion
pix = im.load()
threshold = 200
for x in range(w):
    for y in range(h):
        aaa = pix[x, y]
        bbb = aaa[0] + aaa[1] + aaa[2] #total value
        if bbb<=threshold:
            area[x][y] = 1
        if bbb>threshold:
            area[x][y] = 0
np.set_printoptions(threshold='nan', linewidth=10)

#matrix transponation
ccc = np.array(area) 
area = ccc.T #better solution?

#find all black pixel and set temporary label numbers
i=1
for x in range(40): # width (later)
    for y in range(40): # heigth (later)
        if area[x][y]==1:
            letter[x][y]=1
            label[x][y]=i
            i += 1

#connected components labeling
for x in range(40): # width (later)
    for y in range(40): # heigth (later)
        if area[x][y]==1:
            label[x][y]=i
            #if pixel has neighbour:
            if area[x][y+1]==1:
                #pixel and neighbour get the lowest label             
                pass # tomorrows work
            if area[x+1][y]==1:
                #pixel and neighbour get the lowest label             
                pass # tomorrows work            
            #should i also compare pixel and left neighbour?

#find width of the letter
#find height of the letter
#find the middle of the letter
#middle = [width/2][height/2] #?
#divide letter into 30 parts --> 5 x 6 array

#model letter
#letter A-Z, a-z, 0-9 (maybe more)

#compare each of the 30 parts of the letter with all model letters
#make a weighting

#print(letter)

im.save("D:\\Python26\\PYTHON-PROGRAMME\\bild2.jpg")
print('done')

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

吃颗糖壮壮胆 2024-08-24 02:09:50

OCR 确实不是一件容易的事。这就是文本验证码仍然有效的原因:)

仅讨论字母提取而不是模式识别,用于分隔字母的技术称为 连接组件标签。由于您需要一种更有效的方法来执行此操作,因此请尝试实现本文中描述的两遍算法。另一种描述可以在文章Blob 提取中找到。

编辑:这是我建议的算法的实现:

import sys
from PIL import Image, ImageDraw

class Region():
    def __init__(self, x, y):
        self._pixels = [(x, y)]
        self._min_x = x
        self._max_x = x
        self._min_y = y
        self._max_y = y

    def add(self, x, y):
        self._pixels.append((x, y))
        self._min_x = min(self._min_x, x)
        self._max_x = max(self._max_x, x)
        self._min_y = min(self._min_y, y)
        self._max_y = max(self._max_y, y)

    def box(self):
        return [(self._min_x, self._min_y), (self._max_x, self._max_y)]

def find_regions(im):
    width, height  = im.size
    regions = {}
    pixel_region = [[0 for y in range(height)] for x in range(width)]
    equivalences = {}
    n_regions = 0
    #first pass. find regions.
    for x in xrange(width):
        for y in xrange(height):
            #look for a black pixel
            if im.getpixel((x, y)) == (0, 0, 0, 255): #BLACK
                # get the region number from north or west
                # or create new region
                region_n = pixel_region[x-1][y] if x > 0 else 0
                region_w = pixel_region[x][y-1] if y > 0 else 0

                max_region = max(region_n, region_w)

                if max_region > 0:
                    #a neighbour already has a region
                    #new region is the smallest > 0
                    new_region = min(filter(lambda i: i > 0, (region_n, region_w)))
                    #update equivalences
                    if max_region > new_region:
                        if max_region in equivalences:
                            equivalences[max_region].add(new_region)
                        else:
                            equivalences[max_region] = set((new_region, ))
                else:
                    n_regions += 1
                    new_region = n_regions

                pixel_region[x][y] = new_region

    #Scan image again, assigning all equivalent regions the same region value.
    for x in xrange(width):
        for y in xrange(height):
                r = pixel_region[x][y]
                if r > 0:
                    while r in equivalences:
                        r = min(equivalences[r])

                    if not r in regions:
                        regions[r] = Region(x, y)
                    else:
                        regions[r].add(x, y)

    return list(regions.itervalues())

def main():
    im = Image.open(r"c:\users\personal\py\ocr\test.png")
    regions = find_regions(im)
    draw = ImageDraw.Draw(im)
    for r in regions:
        draw.rectangle(r.box(), outline=(255, 0, 0))
    del draw 
    #im.show()
    output = file("output.png", "wb")
    im.save(output)
    output.close()

if __name__ == "__main__":
    main()

它不是 100% 完美,但由于您这样做只是为了学习目的,所以它可能是一个很好的起点。通过每个角色的边界框,您现在可以使用神经网络,正如其他人在这里建议的那样。

OCR is not an easy task indeed. That's why text CAPTCHAs still work :)

To talk only about the letter extraction and not the pattern recognition, the technique you are using to separate the letters is called Connected Component Labeling. Since you are asking for a more efficient way to do this, try to implement the two-pass algorithm that's described in this article. Another description can be found in the article Blob extraction.

EDIT: Here's the implementation for the algorithm that I have suggested:

import sys
from PIL import Image, ImageDraw

class Region():
    def __init__(self, x, y):
        self._pixels = [(x, y)]
        self._min_x = x
        self._max_x = x
        self._min_y = y
        self._max_y = y

    def add(self, x, y):
        self._pixels.append((x, y))
        self._min_x = min(self._min_x, x)
        self._max_x = max(self._max_x, x)
        self._min_y = min(self._min_y, y)
        self._max_y = max(self._max_y, y)

    def box(self):
        return [(self._min_x, self._min_y), (self._max_x, self._max_y)]

def find_regions(im):
    width, height  = im.size
    regions = {}
    pixel_region = [[0 for y in range(height)] for x in range(width)]
    equivalences = {}
    n_regions = 0
    #first pass. find regions.
    for x in xrange(width):
        for y in xrange(height):
            #look for a black pixel
            if im.getpixel((x, y)) == (0, 0, 0, 255): #BLACK
                # get the region number from north or west
                # or create new region
                region_n = pixel_region[x-1][y] if x > 0 else 0
                region_w = pixel_region[x][y-1] if y > 0 else 0

                max_region = max(region_n, region_w)

                if max_region > 0:
                    #a neighbour already has a region
                    #new region is the smallest > 0
                    new_region = min(filter(lambda i: i > 0, (region_n, region_w)))
                    #update equivalences
                    if max_region > new_region:
                        if max_region in equivalences:
                            equivalences[max_region].add(new_region)
                        else:
                            equivalences[max_region] = set((new_region, ))
                else:
                    n_regions += 1
                    new_region = n_regions

                pixel_region[x][y] = new_region

    #Scan image again, assigning all equivalent regions the same region value.
    for x in xrange(width):
        for y in xrange(height):
                r = pixel_region[x][y]
                if r > 0:
                    while r in equivalences:
                        r = min(equivalences[r])

                    if not r in regions:
                        regions[r] = Region(x, y)
                    else:
                        regions[r].add(x, y)

    return list(regions.itervalues())

def main():
    im = Image.open(r"c:\users\personal\py\ocr\test.png")
    regions = find_regions(im)
    draw = ImageDraw.Draw(im)
    for r in regions:
        draw.rectangle(r.box(), outline=(255, 0, 0))
    del draw 
    #im.show()
    output = file("output.png", "wb")
    im.save(output)
    output.close()

if __name__ == "__main__":
    main()

It's not 100% perfect, but since you are doing this only for learning purposes, it may be a good starting point. With the bounding box of each character you can now use a neural network as others have suggested here.

ˉ厌 2024-08-24 02:09:50

OCR 非常非常难。即使是计算机生成的字符,如果您事先不知道字体和字体大小,这也是相当具有挑战性的。即使您完全匹配字符,我也不会称其为“开始”编程项目;这是相当微妙的。

如果您想识别扫描或手写字符,那就更难了 - 您需要使用高级数学、算法和机器学习。关于这个主题有很多书和数千篇文章,所以你不需要重新发明轮子。

我钦佩你的努力,但我认为你还没有走得足够远,无法解决任何实际的困难。到目前为止,您只是随机探索像素并将它们从一个数组复制到另一个数组。您实际上还没有进行任何比较,我不确定您的“随机游走”的目的。

  • 为什么是随机的?编写正确的随机算法相当困难。我建议首先从确定性算法开始。
  • 为什么要从一个数组复制到另一个数组?为什么不直接比较呢?

当你进行比较时,你必须面对这样一个事实:图像与“原型”并不完全相同,并且不清楚你将如何处理这个问题。

不过,根据您到目前为止编写的代码,我有一个想法给您:尝试编写一个程序来穿过图像中的“迷宫”。输入将是图像,加上起始像素和目标像素。输出是从起点到目标穿过迷宫的路径。这是一个比 OCR 更容易的问题 - 解决迷宫是计算机擅长的事情 - 但它仍然很有趣且具有挑战性。

OCR is very, very hard. Even with computer-generated characters, it's quite challenging if you don't know the font and font size in advance. Even if you're matching characters exactly, I would not call it a "beginning" programming project; it's quite subtle.

If you want to recognize scanned, or handwritten characters, that's even harder - you'll need to use advanced math, algorithms, and machine learning. There are quite a few books and thousands of articles written about this topic, so you don't need to reinvent the wheel.

I admire your effort, but I don't think you've gotten far enough to hit any of the actual difficulties yet. So far you're just randomly exploring pixels and copying them from one array to another. You haven't actually done any comparison yet, and I'm not sure the purpose of your "random walk".

  • Why random? Writing correct randomized algorithms is quite difficult. I would recommend starting with a deterministic algorithm first.
  • Why are you copying from one array to another? Why not just compare directly?

When you get the comparison, you'll have to deal with the fact that the image is not exactly the same as the "prototype", and it's not clear how you'll deal with that.

Based on the code you've written so far, though, I have an idea for you: try writing a program that finds its way through a "maze" in an image. The input would be the image, plus the start pixel and the goal pixel. The output is a path through the maze from the start to the goal. This is a much easier problem than OCR - solving mazes is something that computers are great for - but it's still fun and challenging.

诗化ㄋ丶相逢 2024-08-24 02:09:50

如今大多数 OCR 算法都基于神经网络算法。 Hopfield 网络是一个很好的起点。基于可用的 Hopfield 模型 C 语言,我构建了一个非常python中的基本图像识别算法与您所描述的类似。我已在此处发布了完整的源代码。这是一个玩具项目,不适合真正的 OCR,但可以让您朝着正确的方向开始。

Hopfield 模型用作自动关联存储器来存储和调用一组位图图像。通过计算相应的权重矩阵来存储图像。此后,从任意配置开始,存储器将精确地确定所存储的图像,该图像在汉明距离方面最接近起始配置。 因此,给定存储图像的不完整或损坏版本,网络能够调用相应的原始图像。

可以找到一个带有示例的 Java 小程序此处;该网络使用数字 0-9 的示例输入进行训练。在右边的框中画出来,点击测试,从网络上看结果。

不要让数学符号吓倒您,一旦您获得源代码,算法就会很简单。

Most OCR algorithms these days are based on neural network algorithms. Hopfield networks are a good place to start. Based on the Hopfield Model available here in C, I built a very basic image recognition algorithm in python similar to what you describe. I've posted the full source here. It's a toy project and not suitable for real OCR, but can get you started in the right direction.

The Hopfield model is used as an autoassociative memory to store and recall a set of bitmap images. Images are stored by calculating a corresponding weight matrix. Thereafter, starting from an arbitrary configuration, the memory will settle on exactly that stored image, which is nearest to the starting configuration in terms of Hamming distance. Thus given an incomplete or corrupted version of a stored image, the network is able to recall the corresponding original image.

A Java applet to toy with an example can be found here; the network is trained with example inputs for the digits 0-9. Draw in the box on the right, click test and see the results from the network.

Don't let the mathematical notation intimidate you, the algorithms are straightforward once you get to source code.

请爱~陌生人 2024-08-24 02:09:50

OCR 非常非常困难!使用什么方法来尝试 OCR 将取决于您想要完成的任务(手写识别、计算机生成的文本阅读等)。

但是,为了让您开始,请阅读神经网络和 OCR。以下是一些关于该主题的直接跳转文章:

http://www. codeproject.com/KB/cs/neural_network_ocr.aspx

http://www .codeproject.com/KB/dotnet/simple_ocr.aspx

使用您最喜欢的搜索引擎查找信息。

玩得开心!

OCR is very, very difficult! What approach to use to attempt OCR will be based on what you are trying to accomplish (hand writing recongnition, computer generated text reading, etc.)

However, to get you started, read up on Neural Networks and OCR. Here are a few jump-right-in articles on the subject:

http://www.codeproject.com/KB/cs/neural_network_ocr.aspx

http://www.codeproject.com/KB/dotnet/simple_ocr.aspx

Use your favorite search engine to find information.

Have fun!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文