训练 Tesseract 3 识别燃气表真实图像中的数字
我正在尝试训练超正方体来识别燃气表真实图像中的数字。
我用于训练的图像是用相机制作的,因此存在很多问题:图像分辨率差、图像模糊、光线差或由于过度曝光、反射、阴影等而导致对比度低......
用于训练,我创建了一个大图像,其中包含燃气表图像捕获的一系列数字,并手动编辑文件框以创建 .tr 文件。结果是,只有更清晰和锐利图像的数字被识别,而模糊图像的数字未被超立方体捕获。
I'm trying to train tesseract to recognize numbers from real images of gas meters.
The images that I use for training are made with a camera, for this reason there are many problems: poor images resolution, blurred images, poor lighting or low contrast as a result of the overexposure, reflections, shadows, etc...
For training, I have created a large image with a series of digits captured by the images of the gas meter and I manually edited the file box to create the .tr files. The result is that only the digits of the clearer and sharper images are recognized while the digits of blurred images are not captured by tesseract.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
据我所知,您需要 OpenCV 来识别数字所在的框,但 OpenCV 并不是 OCR 之神。找到盒子后,只需裁剪该部分,进行图像处理,然后将其交给 tesseract 进行 OCR。
我需要 OpenCV 方面的帮助,因为我不知道如何在 OpenCV 中编程。
以下是一些现实世界的例子。
As far as I can tell you need to OpenCV to recognize box in which numbers are located, but OpenCV is not god for OCR. After you locate box, just crop that part, do image processing and then hand it over to tesseract for OCR.
I need help with OpenCV because I don't know how to program in OpenCV.
Here are few real world examples.
我会首先尝试这个简单的 ImageMagick 命令:(
稍微使用
50%
参数 - 尝试使用更小和更高的值...)阈值基本上只留下 2 个值,零或最大值,用于每个颜色通道。低于阈值的值设置为 0,高于阈值的值设置为 255(如果工作在 16 位深度,则为 65535)。
根据您的原始 .jpg,您可能会得到一个支持 OCR 的、有效的、对比度非常高的图像。
I would try this simple ImageMagick command first:
(Play a bit with the
50%
parameter -- try with smaller and higher values...)Thresholding basically leaves over only 2 values, zero or maximum, for each color channel. Values below the threshold get set to 0, values above it get set to 255 (or 65535 if working at 16-bit depth).
Depending on your original.jpg, you may have a OCR-able, working, very high contrast image as a result.
我建议你:
我建议您使用 Tesseract 的 API 本身来增强图像(去噪、标准化、锐化...)
例如:
Boxa * tesseract::TessBaseAPI::GetConnectedComponents(Pixa** pixa)
(它允许您到达每个字符的边界框)Pix* pimg = tess_api->GetThresholdedImage();
此处您可以找到一些示例
I suggest you to:
I recommend you to use Tesseract's API themselves to enhance the image (denoise, normalize, sharpen...)
for example :
Boxa * tesseract::TessBaseAPI::GetConnectedComponents(Pixa** pixa)
(it allows you to get to the bounding boxes of each character)Pix* pimg = tess_api->GetThresholdedImage();
Here you find few examples
Tesseract 是一个相当不错的 OCR 软件包,但不能正确预处理图像。我的经验是,如果您在将其传递给 tesseract 之前进行一些预处理,则可以获得良好的 OCR 结果。
有几个关键点可以显着提高识别能力:
至于第 4 点,如果您知道要使用的字体,那么有一些比使用 Tesseract 更好的解决方案,例如直接在图像上匹配这些字体......基本算法是找到数字并将它们与所有可能的字符匹配(只有 10 个)……但是,实施起来还是很棘手。
Tesseract is a pretty decent OCR package, but doesn't pre-process images properly. My experience is that you can get a good OCR result if you just do some pre-processing before passing it on to tesseract.
There are a couple of key pointers that improves recognition significantly:
As for point 4, if you know the font that's going to be used, there are some better solutions than using Tesseract like matching these fonts directly on the images... The basic algoritm is to find the digits and match them to all possible characters (which are only 10)... still, the implementation is tricky.