tesseract ocr可以创建.trainedData
问题:
我遵循逐步教程提供的在这里训练我的Tesseract OCR获取新字体。但是在第5和6步中,并非所有需要的文件都创建了。
我所做的:
我的图像文件是:en.va.exp0.tif
步骤1:创建.box文件 +纠正错误识别的字符
tesseract en.va.exp0.jpg en.va.exp0 batch.nochop makebox
步骤2:创建.tr文件
tesseract en.va.exp0.tif en.va.exp0 box.train
步骤3:从框文件中提取字符
unicharset_extractor en.va.exp0.box
步骤4: create font_properties file
echo "va 0 0 1 0 0" > font_properties
步骤5:训练数据
mftraining -F font_properties -U unicharset -O en.unicharset en.va.exp0.tr
培训数据,
cntraining en.va.exp0.tr
步骤6:据我所知, 步骤5应该创建4个文件: 可变形,inttemp,pffmtable,normproto。但是只有创建了形状的文件。因此,第6步也不起作用(我认为什么都没做)
材料:
explorer-screenshot-before.jpg
如果需要更多的解释或材料,我会添加它,并提前感谢
The Problem:
I followed the step by step tutorial provided here to train my tesseract ocr for a new font. But on step 5 and 6 not all needed files are created.
What I did:
My image file is: en.va.exp0.tif
Step 1: Creating the .box file + correcting wrongly identified characters
tesseract en.va.exp0.jpg en.va.exp0 batch.nochop makebox
Step 2: Creating .tr file
tesseract en.va.exp0.tif en.va.exp0 box.train
Step 3: Extracting the charset from the box files
unicharset_extractor en.va.exp0.box
Step 4: Create font_properties file
echo "va 0 0 1 0 0" > font_properties
Step 5: Training the data
mftraining -F font_properties -U unicharset -O en.unicharset en.va.exp0.tr
Step 6: Training the data
cntraining en.va.exp0.tr
As far as I know step 5 should create 4 files:
shapetable, inttemp, pffmtable, normproto. But only the shapetable file is created. Because of that step 6 also doesn't work (it simply does nothing i think)
Materials:
explorer-screenshot-before.jpg
If more explanation or material is needed I'll add it and thanks in advance
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
尝试运行Tesseract 4而不是Tesseract 5。
Try running Tesseract 4 instead of Tesseract 5.