OpenCV nodejs准备图像到OCR Tesseractjs,删除点
我正在尝试从网络摄像机捕获的图像中读取有关Tesseract的数据。这是使用 图像:
我正在使用Nodejs服务器工作,我尝试了很多在JIMP中的技术,包括进行倒置/灰度,使用锐化到图像,或fiilterter fieltering特定颜色/黄色/蓝色/蓝色/ ...毕竟,我使用OpenCv4Nodejs构建了分离的Docker容器,并应用了一些技术来从该图像中提取文本。
我主要需要大文本(因此,小文本不是必要的 /在此图像上也不是锋利的 /)。因此,我应用了此信息:
const src = cv.imread('./970f5b45-9f24-41d5-91f0-ef3f8b9d8914.jpeg');
let src2 = src.cvtColor(cv.COLOR_BGR2GRAY)
let dst = src2.adaptiveThreshold(255, cv.ADAPTIVE_THRESH_GAUSSIAN_C, cv.THRESH_BINARY, 12, 2);
let dst2 = dst.morphologyEx(cv.MORPH_OPEN)
之后,我有了这个结果,几乎可以通过OCR阅读,问题是该图像中很多点。是否有机会删除这些点,但是在OpenCV或其他技术中保留结果质量(可读文本)?
结果现在是:
是否可以从该结果中提取文本?如果我在Tesseract的OCR中使用此结果,那么提取文本确实需要很长时间,并且有大量的怪异字符(可能是由于点/形状)。
I'm trying to read data over Tesseract from image captured by webcamera. Here is example of usedimage:
I'm working on nodejs server, and I tried a lot of technique in Jimp including doing invert/grayscale, using sharpening to image, or fiiltering specific colors /yellow/blue/ ... after all I build separated docker container using opencv4nodejs and apply few techniques to extract text from that image.
I need mostly big texts (so small one are not neccessary /also are not sharp on this image/). So I applied this:
const src = cv.imread('./970f5b45-9f24-41d5-91f0-ef3f8b9d8914.jpeg');
let src2 = src.cvtColor(cv.COLOR_BGR2GRAY)
let dst = src2.adaptiveThreshold(255, cv.ADAPTIVE_THRESH_GAUSSIAN_C, cv.THRESH_BINARY, 12, 2);
let dst2 = dst.morphologyEx(cv.MORPH_OPEN)
After that I have this result, which is almost ready for reading by OCR, problem is a lot of dots in that image. Is there any chance to remove that dots, but keep quality of result (readable texts) in opencv, or other technique?
Result is right now:
Is it possible to extract just texts from that result? If I use this result in ocr by tesseract, it takes really a long time to extract text, and there is a huge amount of weird characters (probably because of dots/shapes).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论