OCR在干净的图像上表现不佳| Python Pytesseract

发布于 2025-01-27 16:20:07 字数 1245 浏览 4 评论 0 原文

我一直在研究项目,该项目涉及从图像中提取文本。我研究了 tesseract 是可用的最佳库之一,我决定将其与 opencv 一起使用。 OPENCV 需要进行图像操作。

我一直在使用 tessaract 引擎玩很多,但似乎并没有给我预期的结果。我已将图像附加为参考。我得到的输出是:

1] = 501 [

而是,预期输出为

tm10-50%l

我到目前为止所做的事情:

  • 删除噪声
  • 自适应阈值
  • 发送tesseract OCR引擎

还有其他建议可以改善算法吗?

提前致谢。

代码的摘要:

import cv2
import sys
import pytesseract
import numpy as np
from PIL import Image

if __name__ == '__main__':
  if len(sys.argv) < 2:
    print('Usage: python ocr_simple.py image.jpg')
    sys.exit(1)

  # Read image path from command line
  imPath = sys.argv[1]
  gray  = cv2.imread(imPath, 0)
  # Blur
  blur  = cv2.GaussianBlur(gray,(9,9), 0)
  # Binarizing
  thres = cv2.adaptiveThreshold(blur, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 5, 3)
  text = pytesseract.image_to_string(thresh)
  print(text)

附加图像。 第一个图像是原始图像。 原始图像

第二张图像是已被馈送到 tessaract 输入到tessaract

I have been working on project which involves extracting text from an image. I have researched that tesseract is one of the best libraries available and I decided to use the same along with opencv. Opencv is needed for image manipulation.

I have been playing a lot with tessaract engine and it does not seems to be giving the expected results to me. I have attached the image as an reference. Output I got is:

1] =501 [

Instead, expected output is

TM10-50%L

What I have done so far:

  • Remove noise
  • Adaptive threshold
  • Sending it tesseract ocr engine

Are there any other suggestions to improve the algorithm?

Thanks in advance.

Snippet of the code:

import cv2
import sys
import pytesseract
import numpy as np
from PIL import Image

if __name__ == '__main__':
  if len(sys.argv) < 2:
    print('Usage: python ocr_simple.py image.jpg')
    sys.exit(1)

  # Read image path from command line
  imPath = sys.argv[1]
  gray  = cv2.imread(imPath, 0)
  # Blur
  blur  = cv2.GaussianBlur(gray,(9,9), 0)
  # Binarizing
  thres = cv2.adaptiveThreshold(blur, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 5, 3)
  text = pytesseract.image_to_string(thresh)
  print(text)

Images attached.
First image is original image. Original image

Second image is what has been fed to tessaract. Input to tessaract

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

女皇必胜 2025-02-03 16:20:07

在图像上执行OCR之前,重要的是要预处理图像。这个想法是获得一个处理的图像,其中要提取的文本为黑色,背景为白色。对于此特定图像,我们需要在OCR之前获得ROI。

为此,我们可以转换为,稍微应用一个,然后获得二进制图像。从这里,我们可以应用将单个字母合并在一起。接下来,我们找到轮廓,使用轮廓区域过滤过滤,然后提取ROI。我们使用 -psm 6 配置选项执行文本提取,以假定单个均匀的文本块。看看在这里以获取更多选项。


检测到的ROI

“

提取的ROI

TM10=50%L

import cv2
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

# Grayscale, Gaussian blur, Adaptive threshold
image = cv2.imread('1.jpg')
original = image.copy()
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (3,3), 0)
thresh = cv2.adaptiveThreshold(blur, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 5, 5)

# Perform morph close to merge letters together
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))
close = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel, iterations=3)

# Find contours, contour area filtering, extract ROI
cnts, _ = cv2.findContours(close, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)[-2:]
for c in cnts:
    area = cv2.contourArea(c)
    if area > 1800 and area < 2500:
        x,y,w,h = cv2.boundingRect(c)
        ROI = original[y:y+h, x:x+w]
        cv2.rectangle(image, (x, y), (x + w, y + h), (36,255,12), 3)

# Perform text extraction
ROI = cv2.GaussianBlur(ROI, (3,3), 0)
data = pytesseract.image_to_string(ROI, lang='eng', config='--psm 6')
print(data)

cv2.imshow('ROI', ROI)
cv2.imshow('close', close)
cv2.imshow('image', image)
cv2.waitKey()

Before performing OCR on an image, it's important to preprocess the image. The idea is to obtain a processed image where the text to extract is in black with the background in white. For this specific image, we need to obtain the ROI before we can OCR.

To do this, we can convert to grayscale, apply a slight Gaussian blur, then adaptive threshold to obtain a binary image. From here, we can apply morphological closing to merge individual letters together. Next we find contours, filter using contour area filtering, and then extract the ROI. We perform text extraction using the --psm 6 configuration option to assume a single uniform block of text. Take a look here for more options.


Detected ROI

enter image description here

Extracted ROI

enter image description here

Result from Pytesseract OCR

TM10=50%L

Code

import cv2
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

# Grayscale, Gaussian blur, Adaptive threshold
image = cv2.imread('1.jpg')
original = image.copy()
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (3,3), 0)
thresh = cv2.adaptiveThreshold(blur, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 5, 5)

# Perform morph close to merge letters together
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))
close = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel, iterations=3)

# Find contours, contour area filtering, extract ROI
cnts, _ = cv2.findContours(close, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)[-2:]
for c in cnts:
    area = cv2.contourArea(c)
    if area > 1800 and area < 2500:
        x,y,w,h = cv2.boundingRect(c)
        ROI = original[y:y+h, x:x+w]
        cv2.rectangle(image, (x, y), (x + w, y + h), (36,255,12), 3)

# Perform text extraction
ROI = cv2.GaussianBlur(ROI, (3,3), 0)
data = pytesseract.image_to_string(ROI, lang='eng', config='--psm 6')
print(data)

cv2.imshow('ROI', ROI)
cv2.imshow('close', close)
cv2.imshow('image', image)
cv2.waitKey()
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文