传递针对已知边界盒坐标的Tesseract OCR的图像

发布于 2025-01-26 06:50:49 字数 1724 浏览 7 评论 0原文

我在一个文件夹中几乎没有图像,并且它们的边界盒坐标是每个图像的TXT文件,为:

0 0.503 0.503 0.334 0.994 (类,x,y,w,h)

我的问题是我想使用图像上此边界框使用Tesseract OCR提取文本。 我在编码部分有一些麻烦。 任何帮助将不胜感激。

(我的文件夹中基本上有2个文件,其中一个具有所有图像,另一个图像分别为每个图像的txt文件中的边界框坐标。) 这是下面的代码。

import cv2
import pytesseract
config = ('-l eng --oem 3 --psm 3')
image_path='D:\\Object detection\\test images\\'
labels_path='D:\\Object detection\\labels\\'
for images in os.listdir(image_path):
    spl=images.split('.')[0]
    img_name =os.path.join(image_path,images)
    image=cv2.imread(img_name)
    print(image.shape)
    with open(os.path.join(labels_path,spl+'.txt')) as f:
        t = f.read()
        arr=t.split()
        x,y,w,h=arr[1],arr[2],arr[3],arr[4]
        x=int(x*image.shape[0])
        y=int(y*image.shape[1])
        w=int(w*image.shape[0])
        h=int(h*image.shape[1])
        x1 = round(x-w/2)
        y1 = round(y-h/2)
        x2 = round(x+w/2)
        y2 = round(y+h/2) 
        rect=cv2.rectangle(image,(x1,y1),(x2,y2),(0,0,200),3)
        cropped_img = image[y1:y2, x1:x2]
        data = pytesseract.image_to_string(cropped_img, lang='eng',config=config)
    print(data)

有2个带有图像及其边界框坐标的文件作为上述形式。 我想要的是在图像名称和标签名称相同的文件上循环循环,并希望在ROI上提取文本。我遇到了这个错误。

13         arr=t.split()
     14         x,y,w,h=arr[1],arr[2],arr[3],arr[4]
---> 15         x=int(x*image.shape[0])
     16         y=int(y*image.shape[1])
     17         w=int(w*image.shape[0])

ValueError: invalid literal for int() with base 10: '0.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.50

I have few images in one folder and have their bounding box coordinates as a txt file for every image as:

0 0.503 0.503 0.334 0.994
(class,x,y,w,h)

My issue is I want to extract text using tesseract OCR using this bounding box on the image.
I have some trouble in the coding part.
Any help would be appreciated.

(There are basically 2 files in my folder where one has all the images and another one has the bounding box coordinates in a txt file for each image respectively.)
THis is the code below.

import cv2
import pytesseract
config = ('-l eng --oem 3 --psm 3')
image_path='D:\\Object detection\\test images\\'
labels_path='D:\\Object detection\\labels\\'
for images in os.listdir(image_path):
    spl=images.split('.')[0]
    img_name =os.path.join(image_path,images)
    image=cv2.imread(img_name)
    print(image.shape)
    with open(os.path.join(labels_path,spl+'.txt')) as f:
        t = f.read()
        arr=t.split()
        x,y,w,h=arr[1],arr[2],arr[3],arr[4]
        x=int(x*image.shape[0])
        y=int(y*image.shape[1])
        w=int(w*image.shape[0])
        h=int(h*image.shape[1])
        x1 = round(x-w/2)
        y1 = round(y-h/2)
        x2 = round(x+w/2)
        y2 = round(y+h/2) 
        rect=cv2.rectangle(image,(x1,y1),(x2,y2),(0,0,200),3)
        cropped_img = image[y1:y2, x1:x2]
        data = pytesseract.image_to_string(cropped_img, lang='eng',config=config)
    print(data)

There are 2 files with images and their bounding box coordinates as the form given above.
What i want is to loop over both the file where the image name and the label name is same and want to extract text over the ROI. I am getting this error.

13         arr=t.split()
     14         x,y,w,h=arr[1],arr[2],arr[3],arr[4]
---> 15         x=int(x*image.shape[0])
     16         y=int(y*image.shape[1])
     17         w=int(w*image.shape[0])

ValueError: invalid literal for int() with base 10: '0.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.5030.50

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文