OCR阅读的明显不一致的灰色背景

发布于 2025-01-29 03:31:47 字数 3722 浏览 3 评论 0原文

大约2年前，我问了一个问题一个令人满意的答案。认为最近的脚本已经返回了很多错误，超过30％，所以我决定更改方法，然后提出一个更通用的问题，用原始图像进行思考，而不是我在原始问题中使用的处理过的图像。

这是原始内容：

如您所见，这些示例是原始扫描文档的切片。

问题在于它们的质量不一致，无论是原始打印还是次数扫描。有时数字脱颖而出，有时不脱颖而出。有时我有一个较深的灰色，有时更轻。有时我会得到错误的打印，白线显示打印机未能放置墨水。

此外，它们的字体是“紧密”的方式，因为这些数字彼此之间太近，有时甚至触摸，从而使我无法简单地将每个数字分开以清洁和单独。

我已经尝试了使用OpenCV的各种方法，例如各种模糊：

gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray,(5,5),0) # Innitial cleaning
s_thresh = cv2.threshold(blurred, 120, 255, cv2.THRESH_BINARY_INV)[1]
o_thresh = cv2.threshold(blurred, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]
ac_thres = cv2.adaptiveThreshold(blurred,255,cv2.ADAPTIVE_THRESH_MEAN_C,cv2.THRESH_BINARY_INV,5,10)
ag_thres = cv2.adaptiveThreshold(blurred, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 5, 4)

还连接的组件：

ret, thresh = cv2.threshold(img, 100, 255, cv2.THRESH_BINARY)
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, cv2.getStructuringElement(cv2.MORPH_RECT, (2,2))
gray_img = cv2.cvtColor(opening, cv2.COLOR_BGR2GRAY)
_, blackAndWhite = cv2.threshold(gray_img, 127, 255, cv2.THRESH_BINARY_INV)

nlabels, labels, stats, centroids = cv2.connectedComponentsWithStats(blackAndWhite, None, None, None, 8, cv2.CV_32S)
sizes = stats[1:, -1]  # get CC_STAT_AREA component
img2 = np.zeros((labels.shape), np.uint8)

for i in range(0, nlabels - 1):
    if sizes[i] >= 4:  
        img2[labels == i + 1] = 255

res = cv2.bitwise_not(img2)
gaussian = cv2.GaussianBlur(res, (3, 3), 0)

unsharp_image = cv2.addWeighted(res, 0.3, gaussian, 0.7, 0, res)

但是我仍然得到最多不一致的结果。

我应该改变我的方法吗？你们会推荐什么？

原文

Some 2 years ago, I asked a question here and got a satisfying answer. Think is, recently the script has been returning a lot of errors, over 30%, so I decided to change my approach and just ask a more generic question, thinking with the original images instead of the processed ones I used in my original question.

Here are the originals:

As you can see, these examples are slices of the original scanned documents.

The problem lies in their inconsistent quality, both in the original printing and the subsquent scanning. Sometimes the digits stand out, sometimes not. Sometimes I have a darker gray, sometimes lighter. Sometimes I get a faulty print, with white lines showing where the printer failed to put ink.

Furthermore, their font is way to "tight", as in, the digits are too close to each other, sometimes even touching, precluding me from simply separating each digit in order to clean and OCR individualy.

I've tried various approaches with OpenCV, such as various blurs:

gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray,(5,5),0) # Innitial cleaning
s_thresh = cv2.threshold(blurred, 120, 255, cv2.THRESH_BINARY_INV)[1]
o_thresh = cv2.threshold(blurred, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]
ac_thres = cv2.adaptiveThreshold(blurred,255,cv2.ADAPTIVE_THRESH_MEAN_C,cv2.THRESH_BINARY_INV,5,10)
ag_thres = cv2.adaptiveThreshold(blurred, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 5, 4)

And also connected components:

ret, thresh = cv2.threshold(img, 100, 255, cv2.THRESH_BINARY)
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, cv2.getStructuringElement(cv2.MORPH_RECT, (2,2))
gray_img = cv2.cvtColor(opening, cv2.COLOR_BGR2GRAY)
_, blackAndWhite = cv2.threshold(gray_img, 127, 255, cv2.THRESH_BINARY_INV)

nlabels, labels, stats, centroids = cv2.connectedComponentsWithStats(blackAndWhite, None, None, None, 8, cv2.CV_32S)
sizes = stats[1:, -1]  # get CC_STAT_AREA component
img2 = np.zeros((labels.shape), np.uint8)

for i in range(0, nlabels - 1):
    if sizes[i] >= 4:  
        img2[labels == i + 1] = 255

res = cv2.bitwise_not(img2)
gaussian = cv2.GaussianBlur(res, (3, 3), 0)

unsharp_image = cv2.addWeighted(res, 0.3, gaussian, 0.7, 0, res)

But I still get results that are inconsistent at best.

Should I change my approach? What would you guys recommend?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

划一舟意中人 2025-02-05 03:31:47

这是我原始答案的重新审视方法（现在，在python！！）中进行了完全实现。我正在使用k cmyk cmyk 颜色空间以获取二进制图像。二进制图像是通过 otsu阈值 + a bias bial 的Little Bit ，施加最小面积过滤器到Teserract。

我在这里使用几个图书馆。 iMutils用于在目录中读取图像，OS用于连接路径的 pytesseract ocr。让我们看看代码：

# Imports:
import pytesseract  # tesseract (previous installation)
import numpy as np  # numpy
import cv2  # opencv
import os  # os for paths
from imutils import paths

# Image path:
rootDir = "D:"
baseDir = "opencvImages"
subBaseDir = "numbers"

# Otsu bias:
threshBias = 1.2

# Create os-independent path:
path = os.path.join(rootDir, baseDir, subBaseDir)

# Get the test images paths:
imagePaths = sorted(list(paths.list_images(path)))

# Loop over the test images and OCR them:
for imagePath in imagePaths:

    # Load the image via OpenCV:
    currentImage = cv2.imread(imagePath, cv2.IMREAD_COLOR)

    # Show image:
    showImage("Current Image", currentImage)

    # Convert to float and divide by 255:
    imgFloat = currentImage.astype(np.float) / 255.

    # Calculate channel K:
    kChannel = 1 - np.max(imgFloat, axis=2)

    # Convert back to uint 8:
    kChannel = (255 * kChannel).astype(np.uint8)

    # Threshold via Otsu:
    autoThresh, binaryImage = cv2.threshold(kChannel, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
    # Add a little bias and threshold again:
    autoThresh = threshBias * autoThresh
    _, binaryImage = cv2.threshold(kChannel, autoThresh, 255, cv2.THRESH_BINARY)
    showImage("Current Image (Binary)", binaryImage)

    # Apply a filter area of minimum 50 pixels:
    minArea = 50
    binaryImage = areaFilter(binaryImage, minArea)
    showImage("Current Image (Filtered)", binaryImage)

    # Invert Image:
    binaryImage = 255 - binaryImage

    # Setting up tesseract:
    pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'  # for Windows
    custom_config = r'--oem 3 --psm 6'
    text = pytesseract.image_to_string(binaryImage, config=custom_config)

    # Show recognized text:
    print("Text is: " + text)

有几个定义的功能。 showimage只是我的自定义功能，可以通过 openCV的高级GUI 在窗口中显示图像。窗口弹出后，按任何键继续评估脚本。 aide filter函数从前是相同的函数。它适用于二进制图像的最小面积过滤器：

# Defines a re-sizable image window:
def showImage(imageName, inputImage):
    cv2.namedWindow(imageName, cv2.WINDOW_NORMAL)
    cv2.imshow(imageName, inputImage)
    cv2.waitKey(0)

# Applies a minimum blob area filter to an input binary image:
def areaFilter(binaryImage, minArea):
    totalComponents, labeledPixels, componentsStats, componentsCentroids = cv2.connectedComponentsWithStats(binaryImage,
                                                                                                            connectivity=4)
    remaining_comp_labels = [i for i in range(1, totalComponents) if componentsStats[i][4] >= minArea]
    outImage = np.where(np.isin(labeledPixels, remaining_comp_labels) == True, 255, 0).astype(np.uint8)
    return outImage

让我们检查一些结果。对于第一个图像，这是k（黑色）频道：

这是预滤波的二进制图像（OTSU + bias）：

这是过滤的图像：

teserract returns this:

Text is: 820065084250

The strings returned for every image, according to Teserract, are:

Tesseract OCR
820065084250
930023482930
820065085833
930023485203
820065072022
930023485564
820065084802
820065084691
820065084730
930023445422
820065084551
82006507 1840

Note that the 2在中820065084551已成功识别，即使该数字被部分剪切。最后一个字符串中有空白，可能是因为图像上的数字有点分开。您可以后处理字符串以删除这些白色空间。

Here's a revisited approach to my original answer (now, implemented fully in Python!). I'm using the K channel of the CMYK color space to get a binary image. The binary image is obtained via Otsu Thresholding + a little bit of bias, apply a minimum area filter and then I invert the image and pass it to teserract.

I'm using a couple of libraries here. imutils for reading images in a directory, os for joining paths and pytesseract for the OCR. Let's see the code:

# Imports:
import pytesseract  # tesseract (previous installation)
import numpy as np  # numpy
import cv2  # opencv
import os  # os for paths
from imutils import paths

# Image path:
rootDir = "D:"
baseDir = "opencvImages"
subBaseDir = "numbers"

# Otsu bias:
threshBias = 1.2

# Create os-independent path:
path = os.path.join(rootDir, baseDir, subBaseDir)

# Get the test images paths:
imagePaths = sorted(list(paths.list_images(path)))

# Loop over the test images and OCR them:
for imagePath in imagePaths:

    # Load the image via OpenCV:
    currentImage = cv2.imread(imagePath, cv2.IMREAD_COLOR)

    # Show image:
    showImage("Current Image", currentImage)

    # Convert to float and divide by 255:
    imgFloat = currentImage.astype(np.float) / 255.

    # Calculate channel K:
    kChannel = 1 - np.max(imgFloat, axis=2)

    # Convert back to uint 8:
    kChannel = (255 * kChannel).astype(np.uint8)

    # Threshold via Otsu:
    autoThresh, binaryImage = cv2.threshold(kChannel, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
    # Add a little bias and threshold again:
    autoThresh = threshBias * autoThresh
    _, binaryImage = cv2.threshold(kChannel, autoThresh, 255, cv2.THRESH_BINARY)
    showImage("Current Image (Binary)", binaryImage)

    # Apply a filter area of minimum 50 pixels:
    minArea = 50
    binaryImage = areaFilter(binaryImage, minArea)
    showImage("Current Image (Filtered)", binaryImage)

    # Invert Image:
    binaryImage = 255 - binaryImage

    # Setting up tesseract:
    pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'  # for Windows
    custom_config = r'--oem 3 --psm 6'
    text = pytesseract.image_to_string(binaryImage, config=custom_config)

    # Show recognized text:
    print("Text is: " + text)

There are a couple of defined functions. showImage is just my custom function to show an image in a window via OpenCV's High-level GUI. After a window pop ups, press any key to continue evaluating the script. The areaFilter function is the same function from before. It applies a minimum area filter to the binary image:

# Defines a re-sizable image window:
def showImage(imageName, inputImage):
    cv2.namedWindow(imageName, cv2.WINDOW_NORMAL)
    cv2.imshow(imageName, inputImage)
    cv2.waitKey(0)

# Applies a minimum blob area filter to an input binary image:
def areaFilter(binaryImage, minArea):
    totalComponents, labeledPixels, componentsStats, componentsCentroids = cv2.connectedComponentsWithStats(binaryImage,
                                                                                                            connectivity=4)
    remaining_comp_labels = [i for i in range(1, totalComponents) if componentsStats[i][4] >= minArea]
    outImage = np.where(np.isin(labeledPixels, remaining_comp_labels) == True, 255, 0).astype(np.uint8)
    return outImage

Let's check out some results. For the first image, this is the K (black) channel only:

This is the pre-filtered binary image (Otsu + bias):

This is the filtered image:

Teserract returns this:

Text is: 820065084250

The strings returned for every image, according to Teserract, are:

Tesseract OCR
820065084250
930023482930
820065085833
930023485203
820065072022
930023485564
820065084802
820065084691
820065084730
930023445422
820065084551
82006507 1840

Note that the 2 in 820065084551 is successfully recognized, even though the number is partly cut. There's white space in the last string probably because the numbers on the image are a little bit separated. You can post-process the string to remove these white spaces.

回复收藏 0 原文

~没有更多了~