有没有一种有效的手写文本分割算法?

发布于 2024-12-13 18:39:35 字数 2957 浏览 6 评论 0原文

我想自动按行(以及将来按单词)划分古代手写文本的图像。

第一个明显的部分是预处理图像......

我只是使用简单的数字化(基于像素的亮度)。之后我将数据存储到二维数组中。

下一个明显的部分是分析二进制数组。

  1. 我的第一个算法非常简单 - 如果数组的一行中黑色像素的数量多于最大值最小值值的均方根,那么此行是行的一部分。

    形成行列表后,我剪掉了高度低于平均水平的行。 最后它变成了某种线性回归,试图最小化空白行和文本行之间的差异。 (我假设了这个事实) First results

  2. 我的第二次尝试 - 我尝试将 GA 与多个健身函数结合使用。 染色体包含 3 个值 - xo、x1、x2。 xo [-1;0] x1 [0;0.5] x2 [0;0.5]

函数,确定行与行的同一性(xo + α1 x1 + α2 x2) > 0,其中α1是行中黑色像素的缩放总和,α2是行中极端黑色像素之间的范围的中值。 (a1,a2[0,1]) 我尝试过的另一个函数是 (x1 < α1 OR x2 > α2)(1/xo + [a1 x1] / [a2 x2] ) > 0 最后一个函数是最有效的。 GA 结果 适应度函数为 (1 / (HeigthRange + SpacesRange)

其中 range 是最大值和最小值之间的差值。它代表文本的同质性。该函数的全局最优 - 将图像划分为线条的最平滑方式

。I我正在使用 C# 和我的自编码 GA(经典,具有 2 点交叉,格雷码染色体,最大群体为 40,突变率为 0.05)

现在我不知道如何划分它 准确度约为 100%,

执行此操作的有效算法是什么?


图像成线的 原始 BMP (1.3 MB)


更新2: 将此文本的结果改进至 100% Nev results

我是如何做到的:

  • 修复了范围计数中的小错误
  • ,将健身函数更改为 1/ (distancesRange+1)*(heightsRange+1))
  • 最小化分类函数为(1/xo + x2/range)> 0(行中的点现在不影响分类) (即优化输入数据并使适应度函数优化更加明确)

问题:

Problem

令人惊讶的是,GA 未能识别这条线。我查看了“查找愤怒”功能的调试数据,发现“无法识别”的地方有太多噪音。 函数代码如下:

public double[] Ranges()
{
    var ranges = new double[_original.Height];

    for (int y = 0; y < _original.Height; y++ )
    {
        ranges[y] = 0;
        var dx = new List<int>();
        int last = 0;
        int x = 0; 

        while (last == 0 && x<_original.Width)
        {
            if (_bit[x, y])
                last = x;
            x++;
        }

        if (last == 0)
        {
            ranges[y] = 0;
            continue;
        }

        for (x = last; x<_original.Width; x++)
        {
            if (!_bit[x, y]) continue; 

            if (last != x - 1)
            {
                dx.Add((x-last)+1);
            }
            last = x;
        }
        if (dx.Count > 2)
        {
            dx.Sort();
            ranges[y] = dx[dx.Count / 2];
            //ranges[y] = dx.Average();
        }
        else
            ranges[y] = 0;
    }

    var maximum = ranges.Max();
    for (int i = 0; i < ranges.Length; i++)
    {
        if (Math.Abs(ranges[i] - 0) < 0.9)
            ranges[i] = maximum;
    }
    return ranges;
}

我在这段代码中使用了一些技巧。主要原因 - 我想最小化最近的黑色像素之间的范围,但如果没有像素,则该值变为“0”,并且不可能通过找到最佳值来解决这个问题。第二个原因 - 该代码更改得太频繁。 我会尝试完全更改此代码,但我不知道该怎么做。

问:

  1. 是否有更高效的健身功能?
  2. 如何找到更加通用的判定函数?

I want to automatically divide an image of ancient handwritten text by lines (and by words in future).

The first obvious part is preprocessing the image...

I'm just using a simple digitization (based on brightness of pixel). After that I store data into two-dimensional array.

The next obvious part is analyzing the binary array.

  1. My first algorithm was pretty simple - if there are more black pixels in a row of the array than the root-mean-square of Maximum and Minimum value, then this row is part of line.

    After forming the list of lines I cut off lines with height that is less than average.
    Finally it turned out into some kind of linear regression, trying to minimize the difference between the blank rows and text rows. (I assumed that fact)
    First results

  2. My second attempt - I tried to use GA with several fitness functions.
    The chromosome contained 3 values - xo, x1, x2. xo [-1;0] x1 [0;0.5] x2 [0;0.5]

Function, that determines identity the row to line is (xo + α1 x1 + α2 x2) > 0, where α1 is scaled sum of black pixels in row, α2 is median value of ranges between the extreme black pixels in row. (a1,a2 [0,1])
Another functions, that I tried is (x1 < α1 OR x2 > α2) and (1/xo + [a1 x1] / [a2 x2] ) > 0
The last function is the most efficient.
Results with GA
The fitness function is
(1 / (HeigthRange + SpacesRange)

Where range is difference between maximum and minimum. It represents the homogeneity of text. The global optimum of this function - the most smooth way to divide the image into lines.

I am using C# with my self-coded GA (classical, with 2-point crossover, gray-code chromosomes, maximum population is 40, mutation rate is 0.05)

Now I ran out of ideas how to divide this image into lines with ~100% accuracy.

What is the efficient algorithm to do this?


UPDATE:
Original BMP (1.3 MB)


UPDATE2:
Improved results on this text to 100%
Nev results

How I did it:

  • fixed minor bug in range count
  • changed fitness function to 1/(distancesRange+1)*(heightsRange+1))
  • minimized classifying function to (1/xo + x2/range) > 0 (points in row now don't affect classification)
    (i.e. optimized input data and made fitness function optimizations more explicit)

Problem:

Problem

GA surprisingly failed to recognize this line. I looked at debug data of 'find rages' function and found, that there is too much noise in 'unrecognized' place.
The function code is below:

public double[] Ranges()
{
    var ranges = new double[_original.Height];

    for (int y = 0; y < _original.Height; y++ )
    {
        ranges[y] = 0;
        var dx = new List<int>();
        int last = 0;
        int x = 0; 

        while (last == 0 && x<_original.Width)
        {
            if (_bit[x, y])
                last = x;
            x++;
        }

        if (last == 0)
        {
            ranges[y] = 0;
            continue;
        }

        for (x = last; x<_original.Width; x++)
        {
            if (!_bit[x, y]) continue; 

            if (last != x - 1)
            {
                dx.Add((x-last)+1);
            }
            last = x;
        }
        if (dx.Count > 2)
        {
            dx.Sort();
            ranges[y] = dx[dx.Count / 2];
            //ranges[y] = dx.Average();
        }
        else
            ranges[y] = 0;
    }

    var maximum = ranges.Max();
    for (int i = 0; i < ranges.Length; i++)
    {
        if (Math.Abs(ranges[i] - 0) < 0.9)
            ranges[i] = maximum;
    }
    return ranges;
}

I'm using some hacks in this code. The main reason - I want to minimize the range between nearest black pixels, but if there are no pixels, the value becomes '0', and it becomes impossible to solve this problem with finding optimas. The second reason - this code is changing too frequently.
I'll try to fully change this code, but I have no idea how to do it.

Q:

  1. If there is more efficient fitness function?
  2. How to find more versatile determination function?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

过度放纵 2024-12-20 18:39:35

虽然我不确定如何将以下算法转换为 GA(并且我不确定为什么需要使用 GA 来解决这个问题),而且我的提议可能有偏差,但这里是。

我建议的简单技术是计算每行黑色像素的数量。 (实际上是每行的暗像素密度。)这需要很少的操作,并且通过一些额外的计算,在像素和直方图中找到峰值并不困难。

原始直方图看起来像这样,其中左侧的轮廓显示一行中暗像素的数量。为了可见性,实际计数被归一化为 x = 200。

原始水平计数

经过一些额外的简单处理后添加(如下所述),我们可以生成这样的直方图,可以在某个阈值处进行裁剪。剩下的是指示文本行中心的峰值。

processed Horizo​​ntal count

从那里找到线条是一件简单的事情:只需将直方图剪辑(阈值)为某个值,例如最大值的 1/2 或 2/3,并可选择检查限幅阈值处的峰宽是否为某个最小值 w。

找到更好的直方图的完整(但仍然简单!)算法的一种实现如下:

  1. 使用“移动平均”阈值或类似的局部阈值技术对图像进行二值化,以防对边缘附近的像素进行操作的标准 Otsu 阈值不适用。令人满意。或者,如果您有漂亮的黑白图像,只需使用 128 作为二值化阈值即可。
  2. 创建一个数组来存储直方图。该数组的长度将是图像的高度。
  3. 对于二值化图像中的每个像素 (x,y),找到 (x,y) 上方和下方在某个半径 R 处的暗像素数。也就是说,计算从 (x, y - R) 到x (y + R),包括在内。
  4. 如果垂直半径 R 内的暗像素数量等于或大于 R(即,至少一半像素是暗的),则像素 (x,y) 具有足够的垂直暗邻居。增加第 y 行的 bin 计数。
  5. 当您沿着每一行行进时,跟踪具有足够邻居的像素的最左边和最右边的 x 值。只要宽度(右 - 左 + 1)超过某个最小值,就将暗像素总数除以该宽度。这会标准化计数,以确保包括文本的最后一行等短行。
  6. (可选)平滑生成的直方图。我只使用了 3 行的平均值。

“垂直计数”(步骤 3)消除了恰好位于文本中心线上方或下方的水平笔画。更复杂的算法将直接检查 (x,y) 的上方和下方,但也会检查左上、右上、左下和右下。

通过我在 C# 中相当粗糙的实现,我能够在不到 75 毫秒的时间内处理图像。在 C++ 中,通过一些基本的优化,我毫不怀疑时间可以大大减少。

此直方图方法假设文本是水平的。由于该算法相当快,因此您可能有足够的时间以水平方向每 5 度的增量计算像素计数直方图。具有最大峰/谷差异的扫描方向将指示旋转。

我不熟悉 GA 术语,但如果我的建议有一定价值,我相信您可以将其翻译成 GA 术语。不管怎样,反正我对这个问题很感兴趣,所以不妨分享一下。

编辑:也许对于使用GA,最好考虑“自X中前一个暗像素以来的距离”(或沿角度theta)和“自Y中前一个暗像素以来的距离”(或沿角度[theta - pi/2]) )。您还可以检查所有径向方向上从白色像素到暗像素的距离(以查找循环)。

byte[,] arr = get2DArrayFromBitamp();   //source array from originalBitmap
int w = arr.GetLength(0);               //width of 2D array
int h = arr.GetLength(1);               //height of 2D array

//we can use a second 2D array of dark pixels that belong to vertical strokes
byte[,] bytes = new byte[w, h];         //dark pixels in vertical strokes


//initial morph
int r = 4;        //radius to check for dark pixels
int count = 0;    //number of dark pixels within radius

//fill the bytes[,] array only with pixels belonging to vertical strokes
for (int x = 0; x < w; x++)
{
    //for the first r rows, just set pixels to white
    for (int y = 0; y < r; y++)
    {
        bytes[x, y] = 255;
    }

    //assume pixels of value < 128 are dark pixels in text
    for (int y = r; y < h - r - 1; y++)
    {
        count = 0;

        //count the dark pixels above and below (x,y)
        //total range of check is 2r, from -r to +r
        for (int j = -r; j <= r; j++)
        {
            if (arr[x, y + j] < 128) count++;
        }

        //if half the pixels are dark, [x,y] is part of vertical stroke
        bytes[x, y] = count >= r ? (byte)0 : (byte)255;
    }

    //for the last r rows, just set pixels to white
    for (int y = h - r - 1; y < h; y++)
    {
        bytes[x, y] = 255;
    }
}

//count the number of valid dark pixels in each row
float max = 0;

float[] bins = new float[h];    //normalized "dark pixel strength" for all h rows
int left, right, width;         //leftmost and rightmost dark pixels in row
bool dark = false;              //tracking variable

for (int y = 0; y < h; y++)
{
    //initialize values at beginning of loop iteration
    left = 0;
    right = 0;
    width = 100;

    for (int x = 0; x < w; x++)
    {
        //use value of 128 as threshold between light and dark
        dark = bytes[x, y] < 128;  

        //increment bin if pixel is dark
        bins[y] += dark ? 1 : 0;    

        //update leftmost and rightmost dark pixels
        if (dark)
        {
            if (left == 0) left = x;    
            if (x > right) right = x;   
        }
    }

    width = right - left + 1;

    //for bins with few pixels, treat them as empty
    if (bins[y] < 10) bins[y] = 0;      

    //normalize value according to width
    //divide bin count by width (leftmost to rightmost)
    bins[y] /= width;

    //calculate the maximum bin value so that bins can be scaled when drawn
    if (bins[y] > max) max = bins[y];   
}

//calculated the smoothed value of each bin i by averaging bin i-1, i, and i+1
float[] smooth = new float[bins.Length];

smooth[0] = bins[0];
smooth[smooth.Length - 1] = bins[bins.Length - 1];

for (int i = 1; i < bins.Length - 1; i++)
{
    smooth[i] = (bins[i - 1] + bins[i] + bins[i + 1])/3;
}

//create a new bitmap based on the original bitmap, then draw bins on top
Bitmap bmp = new Bitmap(originalBitmap);

using (Graphics gr = Graphics.FromImage(bmp))
{
    for (int y = 0; y < bins.Length; y++)
    {
        //scale each bin so that it is drawn 200 pixels wide from the left edge
        float value = 200 * (float)smooth[y] / max;
        gr.DrawLine(Pens.Red, new PointF(0, y), new PointF(value, y)); 
    }
}

pictureBox1.Image = bmp;

Although I'm not sure how to translate the following algorithm into GA (and I'm not sure why you need to use GA for this problem), and I could be off base in proposing it, here goes.

The simple technique I would propose is to count the number of black pixels per row. (Actually it's the dark pixel density per row.) This requires very few operations, and with a few additional calculations it's not difficult to find peaks in the pixel-sum histogram.

A raw histogram will look something like this, where the profile along the left side shows the number of dark pixels in a row. For visibility, the actual count is normalized to stretch out to x = 200.

raw horizontal count

After some additional, simple processing is added (described below), we can generate a histogram like this that can be clipped at some threshold value. What remains are peaks indicating the center of lines of text.

processed horizontal count

From there it's a simple matter to find the lines: just clip (threshold) the histogram at some value such as 1/2 or 2/3 the maximum, and optionally check that the width of the peak at your clipping threshold is some minimum value w.

One implementation of the full (yet still simple!) algorithm to find the nicer histogram is as follows:

  1. Binarize the image using a "moving average" threshold or similar local thresholding technique in case a standard Otsu threshold operating on pixels near edges isn't satisfactory. Or, if you have a nice black-on-white image, just use 128 as your binarization threshold.
  2. Create an array to store your histogram. This array's length will be the height of the image.
  3. For each pixel (x,y) in the binarized image, find the number of dark pixels above and below (x,y) at some radius R. That is, count the number of dark pixels from (x, y - R) to x (y + R), inclusive.
  4. If the number of dark pixels within a vertical radius R is equal or greater to R--that is, at least half the pixels are dark--then pixel (x,y) has sufficient vertical dark neighbors. Increment your bin count for row y.
  5. As you march along each row, track the leftmost and rightmost x-values for pixels with sufficient neighbors. As long as the width (right - left + 1) exceeds some minimum value, divide the total count of dark pixels by this width. This normalizes the count to ensure the short lines like the very last line of text are included.
  6. (Optional) Smooth the resulting histogram. I just used the mean over 3 rows.

The "vertical count" (step 3) eliminates horizontal strokes that happen to be located above or below the center line of text. A more sophisticated algorithm would just check directly above and below (x,y), but also to the upper left, upper right, lower left, and lower right.

With my rather crude implementation in C# I was able to process the image in less than 75 milliseconds. In C++, and with some basic optimization, I've little doubt the time could be cut down considerably.

This histogram method assumes the text is horizontal. Since the algorithm is reasonably fast, you may have enough time to calculate pixel count histograms at increments of every 5 degrees from the horizontal. The scan orientation with the greatest peak/valley differences would indicate the rotation.

I'm not familiar with GA terminology, but if what I've suggested is of some value I'm sure you can translate it into GA terms. In any case, I was interested in this problem anyway, so I might as well share.

EDIT: maybe for use GA, it's better to think in terms of "distance since previous dark pixel in X" (or along angle theta) and "distance since previous dark pixel in Y" (or along angle [theta - pi/2]). You might also check distance from white pixel to dark pixel in all radial directions (to find loops).

byte[,] arr = get2DArrayFromBitamp();   //source array from originalBitmap
int w = arr.GetLength(0);               //width of 2D array
int h = arr.GetLength(1);               //height of 2D array

//we can use a second 2D array of dark pixels that belong to vertical strokes
byte[,] bytes = new byte[w, h];         //dark pixels in vertical strokes


//initial morph
int r = 4;        //radius to check for dark pixels
int count = 0;    //number of dark pixels within radius

//fill the bytes[,] array only with pixels belonging to vertical strokes
for (int x = 0; x < w; x++)
{
    //for the first r rows, just set pixels to white
    for (int y = 0; y < r; y++)
    {
        bytes[x, y] = 255;
    }

    //assume pixels of value < 128 are dark pixels in text
    for (int y = r; y < h - r - 1; y++)
    {
        count = 0;

        //count the dark pixels above and below (x,y)
        //total range of check is 2r, from -r to +r
        for (int j = -r; j <= r; j++)
        {
            if (arr[x, y + j] < 128) count++;
        }

        //if half the pixels are dark, [x,y] is part of vertical stroke
        bytes[x, y] = count >= r ? (byte)0 : (byte)255;
    }

    //for the last r rows, just set pixels to white
    for (int y = h - r - 1; y < h; y++)
    {
        bytes[x, y] = 255;
    }
}

//count the number of valid dark pixels in each row
float max = 0;

float[] bins = new float[h];    //normalized "dark pixel strength" for all h rows
int left, right, width;         //leftmost and rightmost dark pixels in row
bool dark = false;              //tracking variable

for (int y = 0; y < h; y++)
{
    //initialize values at beginning of loop iteration
    left = 0;
    right = 0;
    width = 100;

    for (int x = 0; x < w; x++)
    {
        //use value of 128 as threshold between light and dark
        dark = bytes[x, y] < 128;  

        //increment bin if pixel is dark
        bins[y] += dark ? 1 : 0;    

        //update leftmost and rightmost dark pixels
        if (dark)
        {
            if (left == 0) left = x;    
            if (x > right) right = x;   
        }
    }

    width = right - left + 1;

    //for bins with few pixels, treat them as empty
    if (bins[y] < 10) bins[y] = 0;      

    //normalize value according to width
    //divide bin count by width (leftmost to rightmost)
    bins[y] /= width;

    //calculate the maximum bin value so that bins can be scaled when drawn
    if (bins[y] > max) max = bins[y];   
}

//calculated the smoothed value of each bin i by averaging bin i-1, i, and i+1
float[] smooth = new float[bins.Length];

smooth[0] = bins[0];
smooth[smooth.Length - 1] = bins[bins.Length - 1];

for (int i = 1; i < bins.Length - 1; i++)
{
    smooth[i] = (bins[i - 1] + bins[i] + bins[i + 1])/3;
}

//create a new bitmap based on the original bitmap, then draw bins on top
Bitmap bmp = new Bitmap(originalBitmap);

using (Graphics gr = Graphics.FromImage(bmp))
{
    for (int y = 0; y < bins.Length; y++)
    {
        //scale each bin so that it is drawn 200 pixels wide from the left edge
        float value = 200 * (float)smooth[y] / max;
        gr.DrawLine(Pens.Red, new PointF(0, y), new PointF(value, y)); 
    }
}

pictureBox1.Image = bmp;
与君绝 2024-12-20 18:39:35

摆弄了一段时间后,我发现我只需要计算每条线的交叉次数,即从白到黑的转换算作1,从黑到白的转换又加1。通过用 count > 突出显示每一行66 除了最底线之外,我的准确率接近 100%。

当然,对于稍微旋转的扫描文档来说并不稳健。缺点是需要确定正确的阈值。

After fiddling around this for a while I found that I simply need to count the number of crossings for each line, that is, a switch from white to black would count as one, and a switch from black to white would increment by one again. By highlighting each line with a count > 66 I got close to 100% accuracy, except for the bottom most line.

Of course, would not be robust to slightly rotated scanned documents. And there is this disadvantage of needing to determine the correct threshold.

生死何惧 2024-12-20 18:39:35

恕我直言,如图所示,很难做到 100% 完美。
我的答案是给你替代的想法。

想法1:
制作您自己的 ReCaptcha 版本(放在您自己的代言网站上) - 并使其成为一个有趣的游戏..“就像剪下一个单词(边缘应该全部是空白 - 对上下行重叠字符有一定的容忍度)”。

想法2:
这是我们小时候玩的游戏,衣架上的电线全部弯曲成波浪形并连接到蜂鸣器,你必须操纵一根末端有环的魔杖,电线穿过它,从一侧到另一侧而不使蜂鸣器响起。也许你可以适应这个想法并制作一款手机游戏,人们可以在不接触黑色文本的情况下绘制线条(允许重叠字符)......当他们可以画出一条线时,他们会获得分数并达到新的水平,你可以给他们更多的难度图像..

想法 3:
研究 google/recaptcha 如何绕过它

想法 4:
获取 Photoshop SDK 并掌握其提取边缘工具的功能

想法 5:
在 Y 轴上拉伸图像堆,这应该有助于应用算法,然后减少位置测量并将其应用到正常大小的图像上。

IMHO with the image shown that would be so hard to do 100% perfectly.
My answer is to give you alternate idea's.

Idea 1:
Make your own version of ReCaptcha (to put on your very own pron site) - and make it a fun game.. "Like cut out a word (edges should all be white space - with some tolerance for overlapping chars on above and below lines)."

Idea 2:
This was a game we played as kids, the wire of a coat hanger was all bent in waves and connected to a buzzer and you had to navigate a wand with a ring in the end with the wire through it, across one side to the other without making the buzzer go off. Perhaps you could adapt this idea and make a mobile game where people trace out the lines without touching black text (with tolerance for overlapping chars)... when they can do a line they get points and get to new levels where you give them harder images..

Idea 3:
Research how google/recaptcha got around it

Idea 4:
Get the SDK for photoshop and master the functionality of it Extract Edges tool

Idea 5:
Stretch the image heaps on the Y Axis which should help, apply the algorithm, then reduce the location measurements and apply them on the normal sized image.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文