了解 OCR 的 Freeman 链码

发布于 2024-11-24 13:59:01 字数 1198 浏览 0 评论 0原文

请注意，我确实在寻找问题的答案。我不是寻找某些源代码或某些学术论文的链接：我已经使用了源代码并且我已经阅读了论文，但仍然没有弄清楚最后一部分问题...

我正在研究一些快速屏幕字体 OCRing，并且取得了非常好的进展。

我已经找到了基线，分离了角色，将每个角色转换为黑色和白色。白色，然后绘制每个字符的轮廓，以便对其应用弗里曼链码。

基本上它是一个 8 连接的链码，如下所示：

  3  2  1
   \ | /
  4-- --0
   / | \
  5  6  7

因此，如果我有一个“a”，经过所有转换（包括转换为黑色和白色），我最终会得到这样的结果：

那么它的外部计数可能如下所示（我可能在这里犯了一个错误，这是ASCII艺术轮廓，我的“算法”可能会得到错误的轮廓，但这不是我问题的重点）：

 XXXX
X1111X
 XXXX1X
X01111X
X10001X
X10001X
 X111X
  XXX

按照X，我得到链码，即：

0011222334445656677

请注意，这是标准化的链码，但您始终可以像这样标准化链码：您只需保留最小的整数。

（顺便说一句，有一个超级高效的实现来查找链码，您只需获取“X”的 8 个相邻像素，然后在 256 查找表中查找（如果有 0、1、2、3） ,4,5,6 或 7)

然而，我现在的问题是：从 00112223344445656677 链码中，我如何发现我有一个‘一’？

因为，例如，如果我的“a”看起来像这样：

11110
00001
01111
10001
10001
01111  <-- This pixel is now full

那么我的链码现在是： 0002222334445656677

但这也是一个“a”。

我知道这些链码的全部意义在于能够适应如此微小的变化，但我不知道如何找到对应于一个链码的字符。

我已经走了这么远，现在我陷入了困境......

（顺便说一句，我不需要 100% 的效率，并且区分 '0' 与 'O' 或 'o' 之类的事情并不需要确实是一个问题）

原文

Note that I'm really looking for an answer to my question. I am not looking for a link to some source code or to some academic paper: I've already used the source and I've already read papers and still haven't figured out the last part of this issue...

I'm working on some fast screen font OCRing and I'm making very good progress.

I'm already finding the baselines, separating the characters, transforming each character in black & white and then contouring each character in order to apply a Freeman chain code to it.

Basically it's an 8-connected chain code looking like this:

  3  2  1
   \ | /
  4-- --0
   / | \
  5  6  7

So if I have an 'a', after all my transformations (including transforming to black and white), I end up with something like this:

Then it's external countour may look like this (I may be making a mistake here, that's ASCII-art contouring and my 'algorithm' may get the contour wrong but that's not the point of my question):

 XXXX
X1111X
 XXXX1X
X01111X
X10001X
X10001X
 X111X
  XXX

Following the Xs, I get the chain code, which would be:

0011222334445656677

Note that that's the normalized chain code but you can always normalized a chain code like this: you just keep the smallest integer.

(By the way, there's a super-efficient implementation to find the chain code where you simply take the 8 adjacent pixels of an 'X' and then look in a 256 lookup table if you have 0,1,2,3,4,5,6 or 7)

My question now, however, is: from that 0011222334445656677 chain code, how do I find that I have an 'a'?

Because, for example, if my 'a' looks like this:

11110
00001
01111
10001
10001
01111  <-- This pixel is now full

Then my chain code is now: 0002222334445656677

And yet this is also an 'a'.

I know that the whole point of these chain code is to be resilient to such tiny changes but I can't figure out how I'm supposed to find which character corresponds to one chain code.

I've been that far and now I'm stuck...

(By the way, I don't need 100% efficiency and things like differentiating '0' from 'O' or from 'o' isn't really an issue)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

罪歌 2024-12-01 13:59:01

您需要的是一个测量链码之间距离的函数d。然后找到给定链码的字母就很简单了：

输入：

可能字母集的标准化链码S（通常是AZ、az、0-9等的cain码）
需要检测且可能稍微变形的字母的链码x（该链码不会与集合S中的任何链码匹配）

该算法将迭代可能的链码集合并计算距离每个元素的d(x,si)。距离最小的字母将是算法的输出（识别出的字母）。

我建议遵循距离函数：
对于两个链码，将每个方向的长度差相加： d(x,si) = |x0-si0| + |x1-si1| + .. + |x7-si7|。 x0为链码x中0的数量，si0为链码si中0的数量> 等。

举一个例子可以更好地解释我的想法。下图中有字母8、B和D，第四个字母是一个稍微变形的8，需要识别。这些字母使用 Arial 书写，字体大小为 8。图像中的第二行放大了 10 倍，以便更好地看到像素。

在此处输入图像描述

我手动计算了（希望正确）标准化链码：

8:  0011223123344556756677
B:  0000011222223344444666666666
D:  00001112223334444666666666
8': 000011222223344556756666 (deformed 8)

长度差（绝对值）为：


direction | length         | difference to 8'
          | 8 | B | D |  8'|   8 |  B |  D |
----------+---+---+---+----+-----+----+-----
        0 | 2 | 5 | 4 |  4 |   2 |  1 |  0 |
        1 | 3 | 2 | 3 |  2 |   1 |  0 |  1 |
        2 | 3 | 5 | 3 |  5 |   2 |  0 |  2 |
        3 | 3 | 2 | 3 |  2 |   1 |  0 |  1 |
        4 | 2 | 5 | 4 |  2 |   0 |  3 |  2 |
        5 | 3 | 0 | 0 |  3 |   0 |  3 |  3 |
        6 | 3 | 9 | 9 |  5 |   2 |  4 |  4 |
        7 | 3 | 0 | 0 |  1 |   2 |  1 |  1 |
----------+---+---+---+----+-----+----+-----
                        sum   10 | 12 | 14 |

8' 与链码 8 的距离最小，因此算法将识别字母 8。到字母B的距离并没有大多少，但这是因为变形后的8看起来几乎像B。

该方法不是缩放不变的。我认为有两种选择可以克服这个问题：

对于不同的字体大小，具有不同的标准化链码集
一组大尺寸（例如35x46像素）的标准化链码并将输入字母（需要识别）缩放为这个更大的尺寸。

我不太确定距离函数对于所有字母数字字母的集合是否足够好，但我希望如此。为了最大限度地减少识别字母时的错误，您可以在分类步骤中包含其他特征（不仅仅是链码）。同样，您需要距离测量——这次是针对特征向量。

What you need is a function d that measures the distance between chain codes. After then finding the letter to a given chain code is straightforward:

Input:

normalized chain codes S for the set of possible letters (generally the cain codes for A-Z, a-z, 0-9, ...)
chain code x of a letter which needs to be detected and which could be slightly deformed (the chain code wouldn't match any chain code in the set S)

The algorithm would iterate through the set of possible chain codes and calculate the distance d(x,si) for each element. The letter with the smallest distance would be the output of the algorithm (the identified letter).

I would suggest following distance function:
For two chain codes, add up the length differences of each direction: d(x,si) = |x0-si0| + |x1-si1| + .. + |x7-si7|. x0 is the number of 0s in the chain code x, si0 is the number of 0s in the chain code si, etc.

An example will better explain what I'm thinking about. In the following image there are the letters 8, B and D, the fourth letter is a slightly deformed 8, which needs to be identified. The letters are written with Arial with font-size 8. The second line in the image is 10 times enlarged to better see the pixels.

enter image description here

I manually calculated (hopefully correct) the normalized chain codes which are:

8:  0011223123344556756677
B:  0000011222223344444666666666
D:  00001112223334444666666666
8': 000011222223344556756666 (deformed 8)

The length differences (absolut) are:


direction | length         | difference to 8'
          | 8 | B | D |  8'|   8 |  B |  D |
----------+---+---+---+----+-----+----+-----
        0 | 2 | 5 | 4 |  4 |   2 |  1 |  0 |
        1 | 3 | 2 | 3 |  2 |   1 |  0 |  1 |
        2 | 3 | 5 | 3 |  5 |   2 |  0 |  2 |
        3 | 3 | 2 | 3 |  2 |   1 |  0 |  1 |
        4 | 2 | 5 | 4 |  2 |   0 |  3 |  2 |
        5 | 3 | 0 | 0 |  3 |   0 |  3 |  3 |
        6 | 3 | 9 | 9 |  5 |   2 |  4 |  4 |
        7 | 3 | 0 | 0 |  1 |   2 |  1 |  1 |
----------+---+---+---+----+-----+----+-----
                        sum   10 | 12 | 14 |

8' has the smallest distance to the chain code of 8, thus the algorithm would identify the letter 8. The distance to the letter B is not much bigger, but this is because the deformed 8 looks almost like the B.

This method is not scaling invariant. I think there are two options to overcome this:

For different font sizes, having different sets of normalized chain codes
One set of normalized chain codes at a big size (e.g. 35x46 pixel) and scaling the input letter (which needs to be identified) to this bigger size.

I'm not quite sure if the distance function is good enough for the set of all alphanumeric letters but I hope so. To minimize the error in identifying a letter you could include other features (not only chain codes) into the classification step. And again, you would need a distance measure -- this time for feature vectors.

回复收藏 0 原文

少女情怀诗 2024-12-01 13:59:01

由于您的问题不够具体（无论您想要基于链码的完整算法还是只是一些概率分类），我会告诉您我对这个问题的了解。

使用链码，您可以计算符号的一些属性，例如 344445、244445、2555556、344446（任意数量的 4）形式的旋转数，即信。假设链码中有 3 个部分，如下所示。所以，这几乎肯定是“W”！但这是一个很好的案例。您可以计算不同类型旋转的数量，并将其与之前保存的每个字母的值进行比较（您可以手动完成）。
这是一个非常好的分类器，但当然还不够。它无法区分“D”和“O”、“V”和“U”。这在很大程度上取决于你的想象力。

您应该首先创建一些带有参考的字母图像的测试用例，并在更改和发明新标准之间检查您的算法。

希望这至少能部分回答您的问题。

更新：
我突然想到一个好主意:)
您可以计算链中单调序列的数量，例如，对于链 000111222233334443333222444455544443333 （一个简单的愚蠢示例，并不真正对应于任何字母），我们有
000111222233334443333222444455544443333，
00011122223333444 3333222 444455544443333，
000111222233334443333222 4444555 44443333，
0001112222333344433332224444555 44443333，

即四个单调子序列。

这应该是一个很好的概括，只需计算真实字母的变化数量，然后与从检测到的链中获取的变化进行比较，这是一个很好的尝试。

一些问题和想法：

链在某种程度上是循环的，因此您应该处理检测链末端的单调性（以避免逐一错误），
应该考虑一些工件，例如，如果您知道字母足够大（例如，高度为 20 像素），您会希望忽略短于 3 个项目的单调中断，例如:)

As your question is not specific enough (whether you want the full algorithm based on the chain code or just some probabilistic classifying), I'll tell you what I know about the problem.

Using the chain code, you can count some properties of the symbol, e.g. the number of rotations of the form 344445, 244445, 2555556, 344446 (arbitrary number of 4s), i.e. the "spikes" on the letter. Say there are 3 sections in the chain code that looks like this. So, this is almost certainly "W"! But this is a good case. You can count numbers of different kinds of rotations and compare that to previously saved values for every letter (which you do by hand).
This is quite a good classifier, but alone is not sufficient, of course. It will be impossible for it to differentiate "D" and "O", "V" and "U". And much depends on your imagination.

You should start by creating a test case of images of some letters with a reference and check your algorithm between the changes and inventing new criteria.

Hope this answers your question at least partially.

Update:
One bright idea just came into my mind :)
You can count the number of monotonic sequences in the chain, for example, for chain 000111222233334443333222444455544443333 (a quick dumb example, doesn't really correspond to any letter) we have
00011122223333444 3333222444455544443333,
00011122223333444 3333222 444455544443333,
000111222233334443333222 4444555 44443333,
0001112222333344433332224444555 44443333,

i.e. four monotonic subsequences.

This should be a good generalization, just count the number of this changes for real letters and compare to that acquired from the detected chain, this is a good try.

Some problems and ideas:

Chain is cyclic in a way, so you should deal with detecting monotony on the ends of the chain (to avoid off-by-one errors),
Some artifacts should be accounted for, for example, if you know that letter is big enough (for example, 20 pixels in height), you would want to ignore monotony interruption shorter than 3 items, for example :)

回复收藏 0 原文