OCR阅读的明显不一致的灰色背景
大约2年前,我问了一个问题一个令人满意的答案。认为最近的脚本已经返回了很多错误,超过30%,所以我决定更改方法,然后提出一个更通用的问题,用原始图像进行思考,而不是我在原始问题中使用的处理过的图像。
这是原始内容:
如您所见,这些示例是原始扫描文档的切片。
问题在于它们的质量不一致,无论是原始打印还是次数扫描。有时数字脱颖而出,有时不脱颖而出。有时我有一个较深的灰色,有时更轻。有时我会得到错误的打印,白线显示打印机未能放置墨水。
此外,它们的字体是“紧密”的方式,因为这些数字彼此之间太近,有时甚至触摸,从而使我无法简单地将每个数字分开以清洁和单独。
我已经尝试了使用OpenCV的各种方法,例如各种模糊:
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray,(5,5),0) # Innitial cleaning
s_thresh = cv2.threshold(blurred, 120, 255, cv2.THRESH_BINARY_INV)[1]
o_thresh = cv2.threshold(blurred, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]
ac_thres = cv2.adaptiveThreshold(blurred,255,cv2.ADAPTIVE_THRESH_MEAN_C,cv2.THRESH_BINARY_INV,5,10)
ag_thres = cv2.adaptiveThreshold(blurred, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 5, 4)
还连接的组件:
ret, thresh = cv2.threshold(img, 100, 255, cv2.THRESH_BINARY)
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, cv2.getStructuringElement(cv2.MORPH_RECT, (2,2))
gray_img = cv2.cvtColor(opening, cv2.COLOR_BGR2GRAY)
_, blackAndWhite = cv2.threshold(gray_img, 127, 255, cv2.THRESH_BINARY_INV)
nlabels, labels, stats, centroids = cv2.connectedComponentsWithStats(blackAndWhite, None, None, None, 8, cv2.CV_32S)
sizes = stats[1:, -1] # get CC_STAT_AREA component
img2 = np.zeros((labels.shape), np.uint8)
for i in range(0, nlabels - 1):
if sizes[i] >= 4:
img2[labels == i + 1] = 255
res = cv2.bitwise_not(img2)
gaussian = cv2.GaussianBlur(res, (3, 3), 0)
unsharp_image = cv2.addWeighted(res, 0.3, gaussian, 0.7, 0, res)
但是我仍然得到最多不一致的结果。
我应该改变我的方法吗?你们会推荐什么?
Some 2 years ago, I asked a question here and got a satisfying answer. Think is, recently the script has been returning a lot of errors, over 30%, so I decided to change my approach and just ask a more generic question, thinking with the original images instead of the processed ones I used in my original question.
Here are the originals:
As you can see, these examples are slices of the original scanned documents.
The problem lies in their inconsistent quality, both in the original printing and the subsquent scanning. Sometimes the digits stand out, sometimes not. Sometimes I have a darker gray, sometimes lighter. Sometimes I get a faulty print, with white lines showing where the printer failed to put ink.
Furthermore, their font is way to "tight", as in, the digits are too close to each other, sometimes even touching, precluding me from simply separating each digit in order to clean and OCR individualy.
I've tried various approaches with OpenCV, such as various blurs:
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray,(5,5),0) # Innitial cleaning
s_thresh = cv2.threshold(blurred, 120, 255, cv2.THRESH_BINARY_INV)[1]
o_thresh = cv2.threshold(blurred, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]
ac_thres = cv2.adaptiveThreshold(blurred,255,cv2.ADAPTIVE_THRESH_MEAN_C,cv2.THRESH_BINARY_INV,5,10)
ag_thres = cv2.adaptiveThreshold(blurred, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 5, 4)
And also connected components:
ret, thresh = cv2.threshold(img, 100, 255, cv2.THRESH_BINARY)
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, cv2.getStructuringElement(cv2.MORPH_RECT, (2,2))
gray_img = cv2.cvtColor(opening, cv2.COLOR_BGR2GRAY)
_, blackAndWhite = cv2.threshold(gray_img, 127, 255, cv2.THRESH_BINARY_INV)
nlabels, labels, stats, centroids = cv2.connectedComponentsWithStats(blackAndWhite, None, None, None, 8, cv2.CV_32S)
sizes = stats[1:, -1] # get CC_STAT_AREA component
img2 = np.zeros((labels.shape), np.uint8)
for i in range(0, nlabels - 1):
if sizes[i] >= 4:
img2[labels == i + 1] = 255
res = cv2.bitwise_not(img2)
gaussian = cv2.GaussianBlur(res, (3, 3), 0)
unsharp_image = cv2.addWeighted(res, 0.3, gaussian, 0.7, 0, res)
But I still get results that are inconsistent at best.
Should I change my approach? What would you guys recommend?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这是我原始答案的重新审视方法(现在,在
python
!!)中进行了完全实现。我正在使用k
cmyk
cmyk 颜色空间以获取二进制图像。二进制图像是通过 otsu阈值 + a bias bial 的Little Bit ,施加最小面积过滤器到Teserract
。我在这里使用几个图书馆。
iMutils
用于在目录中读取图像,OS
用于连接路径的pytesseract
ocr
。让我们看看代码:有几个定义的功能。
showimage
只是我的自定义功能,可以通过 openCV的高级GUI 在窗口中显示图像。窗口弹出后,按任何键继续评估脚本。aide filter
函数从前是相同的函数。它适用于二进制图像的最小面积过滤器:让我们检查一些结果。对于第一个图像,这是
k
(黑色)频道:这是预滤波的二进制图像(OTSU + bias):
这是过滤的图像:
teserract
returns this:The
strings
returned for every image, according toTeserract
, are:Note that the
2
在中820065084551
已成功识别,即使该数字被部分剪切。最后一个字符串中有空白,可能是因为图像上的数字有点分开。您可以后处理字符串以删除这些白色空间。Here's a revisited approach to my original answer (now, implemented fully in
Python
!). I'm using theK
channel of theCMYK
color space to get a binary image. The binary image is obtained via Otsu Thresholding + a little bit of bias, apply a minimum area filter and then I invert the image and pass it toteserract
.I'm using a couple of libraries here.
imutils
for reading images in a directory,os
for joining paths andpytesseract
for theOCR
. Let's see the code:There are a couple of defined functions.
showImage
is just my custom function to show an image in a window via OpenCV's High-level GUI. After a window pop ups, press any key to continue evaluating the script. TheareaFilter
function is the same function from before. It applies a minimum area filter to the binary image:Let's check out some results. For the first image, this is the
K
(black) channel only:This is the pre-filtered binary image (Otsu + bias):
This is the filtered image:
Teserract
returns this:The
strings
returned for every image, according toTeserract
, are:Note that the
2
in820065084551
is successfully recognized, even though the number is partly cut. There's white space in the last string probably because the numbers on the image are a little bit separated. You can post-process the string to remove these white spaces.