是否可以为 OCR 埃及车牌训练 tesseract v5?
我正在开发一个 OCR 埃及车牌项目,该项目用阿拉伯字母和阿拉伯印度数字书写。来自 https://github.com/Shreeshrii/tessdata_arabic 的训练数据给出的字母准确率为 60%数字为 70%。我猜测准确度不好是因为盘子上的字体不同。此外,这些字母是单独写在盘子上的 (هــــــــــــــــ٠),而在教科书上通常是连接在一起的 (ههــــــــ٠)而且由于检测到的车牌具有不同的照明条件,或者字母可能不太清晰 - 车牌可能脏或变形 -。
这是一个 示例,在预处理后,在开头添加了额外的撇号 ('ä և ٦٢٩)将图像转换为灰度,然后转换为 黑色和白色。正确的字符是 (ä ¤ ٦٢٩)
我试图识别的另一个示例 的车牌。 黑白预处理。这个失败了。它被识别为 (?????????) 盘子上的字符是 (
????????????????????????????????????????????????????????????????????????????????????????????????????????????或者我应该为不同的字体重新训练现有的训练数据(我搜索了字体名称但找不到它)。或者从头开始训练,因为车牌图像有很多噪音并且亮度/对比度不同。
I'm working on a project to OCR Egyptian licence plate written in arabic alphabet and arabic-indic numbers. The traineddata from https://github.com/Shreeshrii/tessdata_arabic gives an accuracy of 60% for letters and 70% for numbers. I'm gussing the bad accuracy is because the font on the plates is different. Also the letters are written seperatly (أ هـ ج)(ل ل ص) on the plates while it's usually connected in text books (أهج)(للص). And also because the plates deteceted have different lighting conditions or the letters may not be so clear -the plate can be dirty or distorted-.
Here's a sample that's recognised with extra apostrophe at the beginning ('ل ل ص ٦٢٩) after preprocessing the image to gray scale then to black and white. The correct characters are (ل ل ص ٦٢٩)
Another sample of the plates I am trying to recognise. black and white preprocessing. This one fails. it's recognised as (ط ئ ؤ د ١٢) The characters on the plate are (ط ج د ١٢٦٤)
Should I try with another preprossiccing? Or should I retrain the existing traineddata for the different font (I searched the font name but couldn't find it). Or train from scratch as the the plate images have alot of noise and differ in brightness/constract.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论