为 OCR 准备复杂图像

发布于 2025-01-08 10:53:57 字数 758 浏览 8 评论 0原文

我想识别信用卡上的数字。更糟糕的是,无法保证源图像的高质量。 OCR是通过神经网络来实现的,但这不应该是这里的主题。

当前的问题是图像预处理。由于信用卡可能有背景和其他复杂的图形,因此文本不如扫描文档那么清晰。我用边缘检测(Canny Edge,Sobel)进行了实验,但并不成功。 还计算灰度图像和模糊图像之间的差异(如 在 OCR 图像处理中删除背景颜色)并没有产生可 OCR 的结果。

我认为大多数方法都会失败,因为特定数字与其背景之间的对比度不够强。可能需要将图像分割成块并为每个块找到最佳的预处理解决方案?

您对如何将源文件转换为可读的二进制图像有什么建议吗? 边缘检测是可行的方法还是应该坚持基本的颜色阈值?

这是灰度阈值方法的示例(我显然对结果不满意):

原始图像:

原始图像

灰度图像:

灰度图像

阈值图像:

阈值图像

感谢您的任何建议, 瓦伦丁

I want to recognize digits from a credit card. To make things worse, the source image is not guaranteed to be of high quality. The OCR is to be realized through a neural network, but that shouldn't be the topic here.

The current issue is the image preprocessing. As credit cards can have backgrounds and other complex graphics, the text is not as clear as with scanning a document. I made experiments with edge detection (Canny Edge, Sobel), but it wasn't that successful.
Also calculating the difference between the greyscale image and a blurred one (as stated at Remove background color in image processing for OCR) did not lead to an OCRable result.

I think most approaches fail because the contrast between a specific digit and its background is not strong enough. There is probably a need to do a segmentation of the image into blocks and find the best preprocessing solution for each block?

Do you have any suggestions how to convert the source to a readable binary image?
Is edge detection the way to go or should I stick with basic color thresholding?

Here is a sample of a greyscale-thresholding approach (where I am obviously not happy with the results):

Original image:

Original image

Greyscale image:

Greyscale image

Thresholded image:

Thresholded image

Thanks for any advice,
Valentin

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

初雪 2025-01-15 10:53:57

如果可能的话,要求使用更好的照明来捕捉图像。低角度光会照亮凸起(或凹陷)字符的边缘,从而大大提高图像质量。如果图像要由机器分析,则应优化照明以提高机器可读性。

也就是说,您应该研究的一种算法是笔划宽度变换,它用于从自然图像中提取字符。

描边宽度变换 (SWT) 实现(Java、C#...)

全局阈值(用于二值化或剪切边缘强度)可能无法满足此应用程序的需要,相反,您应该查看局部阈值。在您的示例图像中,“31”后面的“02”特别弱,因此搜索该区域中最强的局部边缘将比使用单个阈值过滤字符串中的所有边缘更好。

如果您可以识别字符的部分片段,那么您可以使用一些方向形态学操作来帮助连接片段。例如,如果您有两个几乎水平的线段,如下所示,其中 0 是背景,1 是前景...

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 1 1 0 0 1 1 1 1 1 1 0 0 0
0 0 0 1 0 0 0 1 0 1 0 0 0 0 1 0 0 0

那么您可以沿水平方向执行形态学“关闭”操作,仅连接这些线段。内核可能是这样的。

x x x x x
1 1 1 1 1
x x x x x

有更复杂的方法可以使用贝塞尔拟合甚至欧拉螺旋(又名回旋曲线)来执行曲线完成,但是用于识别要连接的段的预处理和用于消除不良连接的后处理可能会变得非常棘手。

If it's at all possible, request that better lighting be used to capture the images. A low-angle light would illuminate the edges of the raised (or sunken) characters, thus greatly improving the image quality. If the image is meant to be analyzed by a machine, then the lighting should be optimized for machine readability.

That said, one algorithm you should look into is the Stroke Width Transform, which is used to extract characters from natural images.

Stroke Width Transform (SWT) implementation (Java, C#...)

A global threshold (for binarization or clipping edge strengths) probably won't cut it for this application, and instead you should look at localized thresholds. In your example images the "02" following the "31" is particularly weak, so searching for the strongest local edges in that region would be better than filtering all edges in the character string using a single threshold.

If you can identify partial segments of characters, then you might use some directional morphology operations to help join segments. For example, if you have two nearly horizontal segments like the following, where 0 is the background and 1 is the foreground...

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 1 1 0 0 1 1 1 1 1 1 0 0 0
0 0 0 1 0 0 0 1 0 1 0 0 0 0 1 0 0 0

then you could perform a morphological "close" operation along the horizontal direction only to join those segments. The kernel could be something like

x x x x x
1 1 1 1 1
x x x x x

There are more sophisticated methods to perform curve completion using Bezier fits or even Euler spirals (a.k.a. clothoids), but preprocessing to identify segments to be joined and postprocessing to eliminate poor joins can get very tricky.

初雪 2025-01-15 10:53:57

我解决这个问题的方法是将卡片分成不同的部分。没有很多独特的信用卡开始(万事达卡,维萨卡,列表由您决定),因此您可以像下拉菜单一样指定它是哪张信用卡。这样,您就可以消除并指定像素区域:

示例:

仅适用于距离底部 20 像素、距离底部 30 像素的区域
从左到右 10 像素到从底部 30 像素(创建一个
矩形)-这将覆盖所有万事达卡

当我使用图像处理程序(有趣的项目)时,我调高了图片的对比度,将其转换为灰度,取 1 像素的每个单独 RGB 值的平均值,并将其与周围的像素:

示例:

PixAvg[i,j] = (Pix.R + Pix.G + Pix.B)/3
if ((PixAvg[i,j] - PixAvg[i,j+1])>30)
    boolEdge == true;

30 是您希望图像的清晰度。差异越小,容忍度就越低。

在我的项目中,为了查看边缘检测,我创建了一个单独的布尔数组,其中包含 boolEdge 的值和一个像素数组。像素阵列仅充满黑色和白色点。它从布尔数组中获取值,其中 boolEdge = true 是一个白点,boolEdge = false 是一个黑点。所以最后,你会得到一个只包含白点和黑点的像素阵列(完整图片)。

从那里,可以更容易地检测数字的开始位置和结束位置。

The way how I would go about the problem is separate the cards into different section. There are not many unique credit cards to begin with (MasterCard, Visa, the list is up to you), so you can make like a drop down to specify which credit card it is. That way, you can eliminate and specify the pixel area:

Example:

Only work with the area 20 pixels from the bottom, 30 pixels from the
left to the 10 pixels from right to 30 pixels from bottom (creating a
rectangle) - This would cover all MasterCards

When I worked with image processing programs (fun project) I turned up the contrast of the picture, converted it to grey scale, took the average of each individual RGB values of 1 pixel, and compared it to the all around pixels:

Example:

PixAvg[i,j] = (Pix.R + Pix.G + Pix.B)/3
if ((PixAvg[i,j] - PixAvg[i,j+1])>30)
    boolEdge == true;

30 would be how distinct you want your image to be. The lower the difference, the lower is going to be the tolerance.

In my project, to view edge detection, I made a separate array of booleans, which contained values from boolEdge, and a pixel array. The pixel array was filled with only black and white dots. It got the values from the boolean array, where boolEdge = true is a white dot, and boolEdge = false is a black dot. So in the end, you end up with a pixel array (full picture) that just contains white and black dots.

From there, it is much easier to detect where a number starts and where a number finishes.

家住魔仙堡 2025-01-15 10:53:57

在我的实现中,我尝试使用这里的代码:http: //rnd.azoft.com/algorithm-identifying-barely-legible-embossed-text-image/
结果更好,但还不够......
我发现很难找到纹理卡的正确参数。

(void)processingByStrokesMethod:(cv::Mat)src dst:(cv::Mat*)dst { 
cv::Mat tmp;  
cv::GaussianBlur(src, tmp, cv::Size(3,3), 2.0);                    // gaussian blur  
tmp = cv::abs(src - tmp);                                          // matrix of differences between source image and blur iamge  

//Binarization:  
cv::threshold(tmp, tmp, 0, 255, CV_THRESH_BINARY | CV_THRESH_OTSU);  

//Using method of strokes:  
int Wout = 12;  
int Win = Wout/2;  
int startXY = Win;  
int endY = src.rows - Win;  
int endX = src.cols - Win;  

for (int j = startXY; j < endY; j++) {  
    for (int i = startXY; i < endX; i++) {  
        //Only edge pixels:  
        if (tmp.at<unsigned char="">(j,i) == 255)  
        {  
            //Calculating maxP and minP within Win-region:  
            unsigned char minP = src.at<unsigned char="">(j,i);  
            unsigned char maxP = src.at<unsigned char="">(j,i);  
            int offsetInWin = Win/2;  

            for (int m = - offsetInWin; m < offsetInWin; m++) {  
                for (int n = - offsetInWin; n < offsetInWin; n++) {  
                    if (src.at<unsigned char="">(j+m,i+n) < minP) {  
                        minP = src.at<unsigned char="">(j+m,i+n);  
                    }else if (src.at<unsigned char="">(j+m,i+n) > maxP) {  
                        maxP = src.at<unsigned char="">(j+m,i+n);  
                    }  
                }  
            }  

            //Voiting:  
            unsigned char meanP = lroundf((minP+maxP)/2.0);  

            for (int l = -Win; l < Win; l++) {  
                for (int k = -Win; k < Win; k++) {  
                    if (src.at<unsigned char="">(j+l,i+k) >= meanP) {  
                        dst->at<unsigned char="">(j+l,i+k)++;  
                    }  
                }  
            }  
        }  
    }  
}  

///// Normalization of imageOut:  
unsigned char maxValue = dst->at<unsigned char="">(0,0);  

for (int j = 0; j < dst->rows; j++) {              //finding max value of imageOut  
    for (int i = 0; i < dst->cols; i++) {  
        if (dst->at<unsigned char="">(j,i) > maxValue)  
            maxValue = dst->at<unsigned char="">(j,i);  
    }  
}  
float knorm = 255.0 / maxValue;  

for (int j = 0; j < dst->rows; j++) {             //normalization of imageOut  
    for (int i = 0; i < dst->cols; i++) {  
        dst->at<unsigned char="">(j,i) = lroundf(dst->at<unsigned char="">(j,i)*knorm);  
    }  
}  

in my implementation i tried to use the code from here:http://rnd.azoft.com/algorithm-identifying-barely-legible-embossed-text-image/
results are better but not enough...
i find it hard to find the right params for texture cards.

(void)processingByStrokesMethod:(cv::Mat)src dst:(cv::Mat*)dst { 
cv::Mat tmp;  
cv::GaussianBlur(src, tmp, cv::Size(3,3), 2.0);                    // gaussian blur  
tmp = cv::abs(src - tmp);                                          // matrix of differences between source image and blur iamge  

//Binarization:  
cv::threshold(tmp, tmp, 0, 255, CV_THRESH_BINARY | CV_THRESH_OTSU);  

//Using method of strokes:  
int Wout = 12;  
int Win = Wout/2;  
int startXY = Win;  
int endY = src.rows - Win;  
int endX = src.cols - Win;  

for (int j = startXY; j < endY; j++) {  
    for (int i = startXY; i < endX; i++) {  
        //Only edge pixels:  
        if (tmp.at<unsigned char="">(j,i) == 255)  
        {  
            //Calculating maxP and minP within Win-region:  
            unsigned char minP = src.at<unsigned char="">(j,i);  
            unsigned char maxP = src.at<unsigned char="">(j,i);  
            int offsetInWin = Win/2;  

            for (int m = - offsetInWin; m < offsetInWin; m++) {  
                for (int n = - offsetInWin; n < offsetInWin; n++) {  
                    if (src.at<unsigned char="">(j+m,i+n) < minP) {  
                        minP = src.at<unsigned char="">(j+m,i+n);  
                    }else if (src.at<unsigned char="">(j+m,i+n) > maxP) {  
                        maxP = src.at<unsigned char="">(j+m,i+n);  
                    }  
                }  
            }  

            //Voiting:  
            unsigned char meanP = lroundf((minP+maxP)/2.0);  

            for (int l = -Win; l < Win; l++) {  
                for (int k = -Win; k < Win; k++) {  
                    if (src.at<unsigned char="">(j+l,i+k) >= meanP) {  
                        dst->at<unsigned char="">(j+l,i+k)++;  
                    }  
                }  
            }  
        }  
    }  
}  

///// Normalization of imageOut:  
unsigned char maxValue = dst->at<unsigned char="">(0,0);  

for (int j = 0; j < dst->rows; j++) {              //finding max value of imageOut  
    for (int i = 0; i < dst->cols; i++) {  
        if (dst->at<unsigned char="">(j,i) > maxValue)  
            maxValue = dst->at<unsigned char="">(j,i);  
    }  
}  
float knorm = 255.0 / maxValue;  

for (int j = 0; j < dst->rows; j++) {             //normalization of imageOut  
    for (int i = 0; i < dst->cols; i++) {  
        dst->at<unsigned char="">(j,i) = lroundf(dst->at<unsigned char="">(j,i)*knorm);  
    }  
}  
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文