onerationimage1
binarizedImage1
ointerimage2 ointerimage2
//i.sstatic.net/hj7mu.png“ rel =“ nofollow noreferrer”> binarizedimage2
binarizedimage3
ointernimage4
“ m在Java(使用OpENCV库)上通过Tesseract(此自定义字体进行了预训练)为OCR准备图像。
有一个带有蓝色文本的图像,在通过OpenCV inrange()方法进行了图像调整大小和二进制后,我有黑白映像,但是有些字母已连接,Tesseract有时会在其上造成错误。
同样,还有更多的问题:原始文本很小,它的边框像素总是具有不同的RGB值,背景也总是不同的。
我试图增加Inrange()方法捕获的像素数量,但具有更多连接的字符。
减少捕获的像素量后,一些字母几乎看不见,Tesseract无法阅读它们。
请告知我如何在二进制图像上用白色将这些角色拆分。
还是有更有效地从彩色图像中提取文本的方法?
任何文本提取/识别建议都不仅对Tesseract和OpenCV都很好。
OriginalImage1
BinarizedImage1
OriginalImage2
BinarizedImage2
OriginalImage3
BinarizedImage3
OriginalImage4
BinarizedImage4
I`m preparing image for OCR by Tesseract (pre-trained for this custom font) on Java (using OpenCV library).
There is an image with blue-colored text, after image resizing and binarization by OpenCV inRange() method I have black and white image, but some letters are connected and Tesseract sometimes makes mistakes on them.
Also, there are few more problems : the original text is pretty small, it`s border pixels always have a bit different RGB values and background always different too.
I tried to increase the number of pixels that the inRange() method captures, but got much more connected characters.
After decreasing amount of captured pixels some letters became barely visible and Tesseract cant read them.
Please, advise me how to split those characters by white color on binarized images.
Or maybe there is more efficient way to extract text from colored images?
Any text extraction/recognizing advices will be good, not only for Tesseract and OpenCV.
发布评论
评论(1)
图像上的所有文本都有蓝色。在第一步中,尝试使用此Tesseract用户论坛。它在Python中,但Java可能会有类似的东西。
All your texts on images have a blue color. In the first step try to use the approach (color filtering) described in this tesseract user forum. It is in python but there could be something similar for java.