传递针对已知边界盒坐标的Tesseract OCR的图像

发布于 2025-01-26 06:50:49 字数 1724 浏览 7 评论 0原文

我在一个文件夹中几乎没有图像，并且它们的边界盒坐标是每个图像的TXT文件，为：

0 0.503 0.503 0.334 0.994 （类，x，y，w，h）

我的问题是我想使用图像上此边界框使用Tesseract OCR提取文本。我在编码部分有一些麻烦。任何帮助将不胜感激。

（我的文件夹中基本上有2个文件，其中一个具有所有图像，另一个图像分别为每个图像的txt文件中的边界框坐标。）这是下面的代码。

import cv2
import pytesseract
config = ('-l eng --oem 3 --psm 3')
image_path='D:\\Object detection\\test images\\'
labels_path='D:\\Object detection\\labels\\'
for images in os.listdir(image_path):
    spl=images.split('.')[0]
    img_name =os.path.join(image_path,images)
    image=cv2.imread(img_name)
    print(image.shape)
    with open(os.path.join(labels_path,spl+'.txt')) as f:
        t = f.read()
        arr=t.split()
        x,y,w,h=arr[1],arr[2],arr[3],arr[4]
        x=int(x*image.shape[0])
        y=int(y*image.shape[1])
        w=int(w*image.shape[0])
        h=int(h*image.shape[1])
        x1 = round(x-w/2)
        y1 = round(y-h/2)
        x2 = round(x+w/2)
        y2 = round(y+h/2) 
        rect=cv2.rectangle(image,(x1,y1),(x2,y2),(0,0,200),3)
        cropped_img = image[y1:y2, x1:x2]
        data = pytesseract.image_to_string(cropped_img, lang='eng',config=config)
    print(data)

有2个带有图像及其边界框坐标的文件作为上述形式。我想要的是在图像名称和标签名称相同的文件上循环循环，并希望在ROI上提取文本。我遇到了这个错误。

13         arr=t.split()
     14         x,y,w,h=arr[1],arr[2],arr[3],arr[4]
---> 15         x=int(x*image.shape[0])
     16         y=int(y*image.shape[1])
     17         w=int(w*image.shape[0])

ValueError: invalid literal for int() with base 10: '0.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.50

原文

I have few images in one folder and have their bounding box coordinates as a txt file for every image as:

0 0.503 0.503 0.334 0.994
(class,x,y,w,h)

My issue is I want to extract text using tesseract OCR using this bounding box on the image.
I have some trouble in the coding part.
Any help would be appreciated.

(There are basically 2 files in my folder where one has all the images and another one has the bounding box coordinates in a txt file for each image respectively.)
THis is the code below.

import cv2
import pytesseract
config = ('-l eng --oem 3 --psm 3')
image_path='D:\\Object detection\\test images\\'
labels_path='D:\\Object detection\\labels\\'
for images in os.listdir(image_path):
    spl=images.split('.')[0]
    img_name =os.path.join(image_path,images)
    image=cv2.imread(img_name)
    print(image.shape)
    with open(os.path.join(labels_path,spl+'.txt')) as f:
        t = f.read()
        arr=t.split()
        x,y,w,h=arr[1],arr[2],arr[3],arr[4]
        x=int(x*image.shape[0])
        y=int(y*image.shape[1])
        w=int(w*image.shape[0])
        h=int(h*image.shape[1])
        x1 = round(x-w/2)
        y1 = round(y-h/2)
        x2 = round(x+w/2)
        y2 = round(y+h/2) 
        rect=cv2.rectangle(image,(x1,y1),(x2,y2),(0,0,200),3)
        cropped_img = image[y1:y2, x1:x2]
        data = pytesseract.image_to_string(cropped_img, lang='eng',config=config)
    print(data)

There are 2 files with images and their bounding box coordinates as the form given above.
What i want is to loop over both the file where the image name and the label name is same and want to extract text over the ROI. I am getting this error.

13         arr=t.split()
     14         x,y,w,h=arr[1],arr[2],arr[3],arr[4]
---> 15         x=int(x*image.shape[0])
     16         y=int(y*image.shape[1])
     17         w=int(w*image.shape[0])

ValueError: invalid literal for int() with base 10: '0.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.50

分享到QQ

分享到微博