tesseract ocr可以创建.trainedData

发布于 2025-01-23 01:55:08 字数 1596 浏览 3 评论 0原文

问题：

我遵循逐步教程提供的在这里训练我的Tesseract OCR获取新字体。但是在第5和6步中，并非所有需要的文件都创建了。

我所做的：

我的图像文件是：en.va.exp0.tif

步骤1：创建.box文件 +纠正错误识别的字符

tesseract en.va.exp0.jpg en.va.exp0 batch.nochop makebox

步骤2：创建.tr文件

tesseract en.va.exp0.tif en.va.exp0 box.train

步骤3：从框文件中提取字符

unicharset_extractor  en.va.exp0.box

步骤4： create font_properties file

echo "va 0 0 1 0 0" > font_properties

步骤5：训练数据

mftraining -F font_properties -U unicharset -O en.unicharset en.va.exp0.tr

培训数据，

cntraining en.va.exp0.tr

步骤6：据我所知，步骤5应该创建4个文件：可变形，inttemp，pffmtable，normproto。但是只有创建了形状的文件。因此，第6步也不起作用（我认为什么都没做）

材料：

explorer-screenshot-before.jpg

cmd-screenshot.jpg

en.va.exp0.tif

如果需要更多的解释或材料，我会添加它，并提前感谢

原文

The Problem:

I followed the step by step tutorial provided here to train my tesseract ocr for a new font. But on step 5 and 6 not all needed files are created.

What I did:

My image file is: en.va.exp0.tif

Step 1: Creating the .box file + correcting wrongly identified characters

tesseract en.va.exp0.jpg en.va.exp0 batch.nochop makebox

Step 2: Creating .tr file

tesseract en.va.exp0.tif en.va.exp0 box.train

Step 3: Extracting the charset from the box files

unicharset_extractor  en.va.exp0.box

Step 4: Create font_properties file

echo "va 0 0 1 0 0" > font_properties

Step 5: Training the data

mftraining -F font_properties -U unicharset -O en.unicharset en.va.exp0.tr

Step 6: Training the data

cntraining en.va.exp0.tr

As far as I know step 5 should create 4 files:
shapetable, inttemp, pffmtable, normproto. But only the shapetable file is created. Because of that step 6 also doesn't work (it simply does nothing i think)

Materials:

explorer-screenshot-before.jpg

explorer-screenshot-after.jpg

cmd-screenshot.jpg

en.va.exp0.tif

If more explanation or material is needed I'll add it and thanks in advance

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

把昨日还给我 2025-01-30 01:55:08

尝试运行Tesseract 4而不是Tesseract 5。

回复收藏 0 原文

~没有更多了~

关于作者

茶底世界

暂无简介

文章

27 人气

关注发私信

alipaysp_snBf0MSZIv

文章 0 评论 0

关注

梦断已成空

文章 0 评论 0

关注

瞎闹

文章 0 评论 0

关注

凯凯我们等你回来

文章 0 评论 0

关注

寄意

文章 0 评论 0

关注

似梦非梦

文章 0 评论 0

友情链接

文江博客

tesseract ocr可以创建.trainedData

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

tesseract ocr可以创建.trainedData

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。