了解 OCR 的 Freeman 链码
请注意,我确实在寻找问题的答案。我不是寻找某些源代码或某些学术论文的链接:我已经使用了源代码并且我已经阅读了论文,但仍然没有弄清楚最后一部分问题...
我正在研究一些快速屏幕字体 OCRing,并且取得了非常好的进展。
我已经找到了基线,分离了角色,将每个角色转换为黑色和白色。白色,然后绘制每个字符的轮廓,以便对其应用弗里曼链码。
基本上它是一个 8 连接的链码,如下所示:
3 2 1
\ | /
4-- --0
/ | \
5 6 7
因此,如果我有一个“a”,经过所有转换(包括转换为黑色和白色),我最终会得到这样的结果:
11110
00001
01111
10001
10001
01110
那么它的外部计数可能如下所示(我可能在这里犯了一个错误,这是ASCII艺术轮廓,我的“算法”可能会得到错误的轮廓,但这不是我问题的重点):
XXXX
X1111X
XXXX1X
X01111X
X10001X
X10001X
X111X
XXX
按照X,我得到链码,即:
0011222334445656677
请注意,这是标准化的链码,但您始终可以像这样标准化链码:您只需保留最小的整数。
(顺便说一句,有一个超级高效的实现来查找链码,您只需获取“X”的 8 个相邻像素,然后在 256 查找表中查找(如果有 0、1、2、3) ,4,5,6 或 7)
然而,我现在的问题是:从 00112223344445656677 链码中,我如何发现我有一个‘一’?
因为,例如,如果我的“a”看起来像这样:
11110
00001
01111
10001
10001
01111 <-- This pixel is now full
那么我的链码现在是: 0002222334445656677
但这也是一个“a”。
我知道这些链码的全部意义在于能够适应如此微小的变化,但我不知道如何找到对应于一个链码的字符。
我已经走了这么远,现在我陷入了困境......
(顺便说一句,我不需要 100% 的效率,并且区分 '0' 与 'O' 或 'o' 之类的事情并不需要确实是一个问题)
Note that I'm really looking for an answer to my question. I am not looking for a link to some source code or to some academic paper: I've already used the source and I've already read papers and still haven't figured out the last part of this issue...
I'm working on some fast screen font OCRing and I'm making very good progress.
I'm already finding the baselines, separating the characters, transforming each character in black & white and then contouring each character in order to apply a Freeman chain code to it.
Basically it's an 8-connected chain code looking like this:
3 2 1
\ | /
4-- --0
/ | \
5 6 7
So if I have an 'a', after all my transformations (including transforming to black and white), I end up with something like this:
11110
00001
01111
10001
10001
01110
Then it's external countour may look like this (I may be making a mistake here, that's ASCII-art contouring and my 'algorithm' may get the contour wrong but that's not the point of my question):
XXXX
X1111X
XXXX1X
X01111X
X10001X
X10001X
X111X
XXX
Following the Xs, I get the chain code, which would be:
0011222334445656677
Note that that's the normalized chain code but you can always normalized a chain code like this: you just keep the smallest integer.
(By the way, there's a super-efficient implementation to find the chain code where you simply take the 8 adjacent pixels of an 'X' and then look in a 256 lookup table if you have 0,1,2,3,4,5,6 or 7)
My question now, however, is: from that 0011222334445656677 chain code, how do I find that I have an 'a'?
Because, for example, if my 'a' looks like this:
11110
00001
01111
10001
10001
01111 <-- This pixel is now full
Then my chain code is now: 0002222334445656677
And yet this is also an 'a'.
I know that the whole point of these chain code is to be resilient to such tiny changes but I can't figure out how I'm supposed to find which character corresponds to one chain code.
I've been that far and now I'm stuck...
(By the way, I don't need 100% efficiency and things like differentiating '0' from 'O' or from 'o' isn't really an issue)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您需要的是一个测量链码之间距离的函数
d
。然后找到给定链码的字母就很简单了:输入:
S
(通常是AZ、az、0-9等的cain码)x
(该链码不会与集合S
中的任何链码匹配)该算法将迭代可能的链码集合并计算距离每个元素的
d(x,si)
。距离最小的字母将是算法的输出(识别出的字母)。我建议遵循距离函数:
对于两个链码,将每个方向的长度差相加: d(x,si) = |x0-si0| + |x1-si1| + .. + |x7-si7|。
x0
为链码x
中0的数量,si0
为链码si
中0的数量> 等。举一个例子可以更好地解释我的想法。下图中有字母8、B和D,第四个字母是一个稍微变形的8,需要识别。这些字母使用 Arial 书写,字体大小为 8。图像中的第二行放大了 10 倍,以便更好地看到像素。
我手动计算了(希望正确)标准化链码:
长度差(绝对值)为:
8'
与链码8
的距离最小,因此算法将识别字母8
。到字母B
的距离并没有大多少,但这是因为变形后的8看起来几乎像B
。该方法不是缩放不变的。我认为有两种选择可以克服这个问题:
我不太确定距离函数对于所有字母数字字母的集合是否足够好,但我希望如此。为了最大限度地减少识别字母时的错误,您可以在分类步骤中包含其他特征(不仅仅是链码)。同样,您需要距离测量——这次是针对特征向量。
What you need is a function
d
that measures the distance between chain codes. After then finding the letter to a given chain code is straightforward:Input:
S
for the set of possible letters (generally the cain codes for A-Z, a-z, 0-9, ...)x
of a letter which needs to be detected and which could be slightly deformed (the chain code wouldn't match any chain code in the setS
)The algorithm would iterate through the set of possible chain codes and calculate the distance
d(x,si)
for each element. The letter with the smallest distance would be the output of the algorithm (the identified letter).I would suggest following distance function:
For two chain codes, add up the length differences of each direction:
d(x,si) = |x0-si0| + |x1-si1| + .. + |x7-si7|
.x0
is the number of 0s in the chain codex
,si0
is the number of 0s in the chain codesi
, etc.An example will better explain what I'm thinking about. In the following image there are the letters 8, B and D, the fourth letter is a slightly deformed 8, which needs to be identified. The letters are written with Arial with font-size 8. The second line in the image is 10 times enlarged to better see the pixels.
I manually calculated (hopefully correct) the normalized chain codes which are:
The length differences (absolut) are:
8'
has the smallest distance to the chain code of8
, thus the algorithm would identify the letter8
. The distance to the letterB
is not much bigger, but this is because the deformed 8 looks almost like theB
.This method is not scaling invariant. I think there are two options to overcome this:
I'm not quite sure if the distance function is good enough for the set of all alphanumeric letters but I hope so. To minimize the error in identifying a letter you could include other features (not only chain codes) into the classification step. And again, you would need a distance measure -- this time for feature vectors.
由于您的问题不够具体(无论您想要基于链码的完整算法还是只是一些概率分类),我会告诉您我对这个问题的了解。
使用链码,您可以计算符号的一些属性,例如 344445、244445、2555556、344446(任意数量的 4)形式的旋转数,即信。假设链码中有 3 个部分,如下所示。所以,这几乎肯定是“W”!但这是一个很好的案例。您可以计算不同类型旋转的数量,并将其与之前保存的每个字母的值进行比较(您可以手动完成)。
这是一个非常好的分类器,但当然还不够。它无法区分“D”和“O”、“V”和“U”。这在很大程度上取决于你的想象力。
您应该首先创建一些带有参考的字母图像的测试用例,并在更改和发明新标准之间检查您的算法。
希望这至少能部分回答您的问题。
更新:
我突然想到一个好主意:)
您可以计算链中单调序列的数量,例如,对于链 000111222233334443333222444455544443333 (一个简单的愚蠢示例,并不真正对应于任何字母),我们有
000111222233334443333222444455544443333,
00011122223333444 3333222 444455544443333,
000111222233334443333222 4444555 44443333,
0001112222333344433332224444555 44443333,
即四个单调子序列。
这应该是一个很好的概括,只需计算真实字母的变化数量,然后与从检测到的链中获取的变化进行比较,这是一个很好的尝试。
一些问题和想法:
As your question is not specific enough (whether you want the full algorithm based on the chain code or just some probabilistic classifying), I'll tell you what I know about the problem.
Using the chain code, you can count some properties of the symbol, e.g. the number of rotations of the form 344445, 244445, 2555556, 344446 (arbitrary number of 4s), i.e. the "spikes" on the letter. Say there are 3 sections in the chain code that looks like this. So, this is almost certainly "W"! But this is a good case. You can count numbers of different kinds of rotations and compare that to previously saved values for every letter (which you do by hand).
This is quite a good classifier, but alone is not sufficient, of course. It will be impossible for it to differentiate "D" and "O", "V" and "U". And much depends on your imagination.
You should start by creating a test case of images of some letters with a reference and check your algorithm between the changes and inventing new criteria.
Hope this answers your question at least partially.
Update:
One bright idea just came into my mind :)
You can count the number of monotonic sequences in the chain, for example, for chain 000111222233334443333222444455544443333 (a quick dumb example, doesn't really correspond to any letter) we have
00011122223333444 3333222444455544443333,
00011122223333444 3333222 444455544443333,
000111222233334443333222 4444555 44443333,
0001112222333344433332224444555 44443333,
i.e. four monotonic subsequences.
This should be a good generalization, just count the number of this changes for real letters and compare to that acquired from the detected chain, this is a good try.
Some problems and ideas:
您可以将链码转换为一种更简单的模型来传达拓扑,然后运行机器学习代码(可能会用 Prolog 编写)。
但我不会赞同它。人们已经这样做/尝试了很多年,但我们仍然没有好的结果。
与其在这种非线性/基于阈值的方法上浪费时间,为什么不使用基于相关性的鲁棒技术呢?最简单的事情是与模板进行卷积。
但我会在字母上开发Gabor 小波,并将系数排序到向量空间中。使用一些示例训练支持向量机,然后将其用作分类器。
这几乎就是我们的大脑的工作方式,我确信它在计算机中是可能的。
一些随机的聊天(忽略):
我不会使用神经网络,因为我不理解它们,因此不喜欢它们。
不过,Geoff Hintons 小组 http://www.youtube 的工作始终给我留下了深刻的印象。 com/watch?v=VdIURAu1-aU。
不知何故,他致力于研究可以向后传播信息的网络(深度学习)。
有一次关于他的谈话,他让受过训练的数字识别网络梦想。这意味着他将输出神经元之一设置为“2”,网络将生成它认为输入神经元上有两个的事物的图片。
我发现这很酷。
You could convert the chain code into an even simpler model that conveys the topology and then run machine learning code (which one would probably write in Prolog).
But I wouldn't endorse it. People have done/tried this for years and we still have no good results.
Instead of wasting your time with this non-linear/threshold based approach, why don't you just use a robust technique based on correlation? The easiest thing would be to convolve with templates.
But I would develop Gabor wavelets on the letters and sort the coefficients into a vector space. Train a support vector machine with some examples and then use it as a classifier.
This is pretty much how our brain does it and I'm sure its possible in the computer.
Some random chit chat (ignore):
I wouldn't use neuronal networks because I don't understand them and therefore don't like them.
However, I'm always impressed by work of Geoff Hintons group http://www.youtube.com/watch?v=VdIURAu1-aU.
Somehow he works on networks that can propagate information backward (deep learning).
There is a talk of him where he lets a trained digit recognition network dream. That means he sets one of the output neurons to "2" and the network will generate pictures of things that it thinks are two on the input neurons.
I found this very cool.
上个月,我正在处理同样的问题。现在,我通过vetex链码解决了这个问题。
顶点链码是二进制链码。然后,我把它切成5部分。显然,数字0-9在不同的部分有其自己的特点。
Last month, I was dealing with the same problem. Now, I have solved this problem by vetex chain code.
The vetex chain code is the binary chain code. Then, I cut it to 5 parts. Obviously, The number 0-9 has its own charcter in different part.