当您执行更多 OCR 时,Tesseract 似乎正在学习字符,如何在使用之间保存学习数据?
我有一组特定的 10 张图像来执行 OCR。它们都是数字;有点短,每张图片大约 20 位数字。有一个特定的图像,如果我先运行它,它会出现一些不匹配的情况;但是,如果我先运行其他测试,然后再回到那个测试,所有字符都会匹配。
随着更多 OCR 操作的执行,我倾向于得出这样的结论:Tesseract 正在学习字符,这让我非常高兴。现在的问题是,如果可能的话,我可以保存学习数据,以便 Tesseract 知道在我下次使用它时拾取它吗?
I have a particular set of 10 images to perform OCRs. They are all digits; somewhat short, about 20 digits in each image. There is one particular image, if I run it first, it will have some mismatches; however, if I run other tests first, then come back to that one, all characters match.
I am inclined to conclude that Tesseract is learning the characters as more OCR operations are performed, which makes me very happy. Now the question is, if it's possible, for me to save the learning data, so Tesseract would know to pick it up the next time I use it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以在Tesseract配置文件中将classify_save_adapted_templates设置为1以保存适应的模板,并将classify_use_pre_adapted_templates设置为1以在下次运行Tesseract时加载模板
指定这些选项行为的代码位于此处:
http://code.google .com/p/tesseract-ocr/source/browse/trunk/classify/classify.cpp?r=570
You can set classify_save_adapted_templates to 1 in your Tesseract config file to save the adapted templates and set classify_use_pre_adapted_templates to 1 to load the templates next time you run Tesseract
The code that specifies the behavior of these options is here:
http://code.google.com/p/tesseract-ocr/source/browse/trunk/classify/classify.cpp?r=570