OpenCV nodejs准备图像到OCR Tesseractjs，删除点

发布于 2025-01-30 10:39:21 字数 955 浏览 3 评论 0原文

我正在尝试从网络摄像机捕获的图像中读取有关Tesseract的数据。这是使用图像：

我正在使用Nodejs服务器工作，我尝试了很多在JIMP中的技术，包括进行倒置/灰度，使用锐化到图像，或fiilterter fieltering特定颜色/黄色/蓝色/蓝色/ ...毕竟，我使用OpenCv4Nodejs构建了分离的Docker容器，并应用了一些技术来从该图像中提取文本。

我主要需要大文本（因此，小文本不是必要的 /在此图像上也不是锋利的 /）。因此，我应用了此信息：

const src = cv.imread('./970f5b45-9f24-41d5-91f0-ef3f8b9d8914.jpeg');
    
let src2 = src.cvtColor(cv.COLOR_BGR2GRAY)
let dst = src2.adaptiveThreshold(255, cv.ADAPTIVE_THRESH_GAUSSIAN_C, cv.THRESH_BINARY, 12, 2);
let dst2 = dst.morphologyEx(cv.MORPH_OPEN)

之后，我有了这个结果，几乎可以通过OCR阅读，问题是该图像中很多点。是否有机会删除这些点，但是在OpenCV或其他技术中保留结果质量（可读文本）？

结果现在是：

是否可以从该结果中提取文本？如果我在Tesseract的OCR中使用此结果，那么提取文本确实需要很长时间，并且有大量的怪异字符（可能是由于点/形状）。

原文

I'm trying to read data over Tesseract from image captured by webcamera. Here is example of usedimage:

I'm working on nodejs server, and I tried a lot of technique in Jimp including doing invert/grayscale, using sharpening to image, or fiiltering specific colors /yellow/blue/ ... after all I build separated docker container using opencv4nodejs and apply few techniques to extract text from that image.

I need mostly big texts (so small one are not neccessary /also are not sharp on this image/). So I applied this:

const src = cv.imread('./970f5b45-9f24-41d5-91f0-ef3f8b9d8914.jpeg');
    
let src2 = src.cvtColor(cv.COLOR_BGR2GRAY)
let dst = src2.adaptiveThreshold(255, cv.ADAPTIVE_THRESH_GAUSSIAN_C, cv.THRESH_BINARY, 12, 2);
let dst2 = dst.morphologyEx(cv.MORPH_OPEN)

After that I have this result, which is almost ready for reading by OCR, problem is a lot of dots in that image. Is there any chance to remove that dots, but keep quality of result (readable texts) in opencv, or other technique?

Result is right now:

Is it possible to extract just texts from that result? If I use this result in ocr by tesseract, it takes really a long time to extract text, and there is a huge amount of weird characters (probably because of dots/shapes).

分享到QQ

分享到微博