我一直在研究项目,该项目涉及从图像中提取文本。我研究了 tesseract
是可用的最佳库之一,我决定将其与 opencv
一起使用。 OPENCV
需要进行图像操作。
我一直在使用 tessaract
引擎玩很多,但似乎并没有给我预期的结果。我已将图像附加为参考。我得到的输出是:
1] = 501 [
而是,预期输出为
tm10-50%l
我到目前为止所做的事情:
- 删除噪声
- 自适应阈值
- 发送tesseract OCR引擎
还有其他建议可以改善算法吗?
提前致谢。
代码的摘要:
import cv2
import sys
import pytesseract
import numpy as np
from PIL import Image
if __name__ == '__main__':
if len(sys.argv) < 2:
print('Usage: python ocr_simple.py image.jpg')
sys.exit(1)
# Read image path from command line
imPath = sys.argv[1]
gray = cv2.imread(imPath, 0)
# Blur
blur = cv2.GaussianBlur(gray,(9,9), 0)
# Binarizing
thres = cv2.adaptiveThreshold(blur, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 5, 3)
text = pytesseract.image_to_string(thresh)
print(text)
附加图像。
第一个图像是原始图像。 原始图像
第二张图像是已被馈送到 tessaract
。 输入到tessaract
I have been working on project which involves extracting text from an image. I have researched that tesseract
is one of the best libraries available and I decided to use the same along with opencv
. Opencv
is needed for image manipulation.
I have been playing a lot with tessaract
engine and it does not seems to be giving the expected results to me. I have attached the image as an reference. Output I got is:
1] =501 [
Instead, expected output is
TM10-50%L
What I have done so far:
- Remove noise
- Adaptive threshold
- Sending it tesseract ocr engine
Are there any other suggestions to improve the algorithm?
Thanks in advance.
Snippet of the code:
import cv2
import sys
import pytesseract
import numpy as np
from PIL import Image
if __name__ == '__main__':
if len(sys.argv) < 2:
print('Usage: python ocr_simple.py image.jpg')
sys.exit(1)
# Read image path from command line
imPath = sys.argv[1]
gray = cv2.imread(imPath, 0)
# Blur
blur = cv2.GaussianBlur(gray,(9,9), 0)
# Binarizing
thres = cv2.adaptiveThreshold(blur, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 5, 3)
text = pytesseract.image_to_string(thresh)
print(text)
Images attached.
First image is original image. Original image
Second image is what has been fed to tessaract
. Input to tessaract
发布评论
评论(1)
在图像上执行OCR之前,重要的是要预处理图像。这个想法是获得一个处理的图像,其中要提取的文本为黑色,背景为白色。对于此特定图像,我们需要在OCR之前获得ROI。
为此,我们可以转换为,稍微应用一个,然后获得二进制图像。从这里,我们可以应用将单个字母合并在一起。接下来,我们找到轮廓,使用轮廓区域过滤过滤,然后提取ROI。我们使用
-psm 6
配置选项执行文本提取,以假定单个均匀的文本块。看看在这里以获取更多选项。检测到的ROI
提取的ROI
Before performing OCR on an image, it's important to preprocess the image. The idea is to obtain a processed image where the text to extract is in black with the background in white. For this specific image, we need to obtain the ROI before we can OCR.
To do this, we can convert to grayscale, apply a slight Gaussian blur, then adaptive threshold to obtain a binary image. From here, we can apply morphological closing to merge individual letters together. Next we find contours, filter using contour area filtering, and then extract the ROI. We perform text extraction using the
--psm 6
configuration option to assume a single uniform block of text. Take a look here for more options.Detected ROI
Extracted ROI
Result from Pytesseract OCR
Code