已知字体的 OCR
我正在寻找一个 OCR 库,可以用字体参数化, 因为我一直都知道,而且我相信这样识别结果会好很多。
有谁知道吗?
im searching for an OCR lib, that can be parameterized with a font,
because I always know it and I believe the recognition results will be lots better this way.
Does anyone know ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
大多数 OCR 引擎都能很好地处理这种情况。事实上,如果页面上只有一种字体可以识别,OCR 引擎不会感到困惑。根据我的经验,这很奇怪但很真实。
如果 OCR 引擎首先可以读取您的字体,那么我会直接使用它而不用担心它。有更好的选择来提高识别度。
许多 OCR 引擎允许您设置一些识别参数来帮助提高识别效果,例如固定宽度或比例、衬线或非衬线、机器或手写打印。您还可以仅选择字符子集(例如大写或数字)以显着改善结果。也就是说,如果您只有数字字符,那么 0(零)字符永远不会与“O”或“o”或“Ø”混淆。您会发现这些提示比能够选择 OCR 确切字体类型的选项更有效。
其他引擎将允许您训练 OCR 引擎来处理新字体,如果您有奇怪的字体,这将有很大帮助。
如果你的图像质量很好,字体干净且大小合适,那么我建议使用 Google 的 Tesseract OCR 和 OCROpus,如 Michael 所建议的 米奥。它是免费的,并且在干净、清晰的文本上运行良好。如果文本有点困难,那么肯定有更好的 OCR 引擎,例如 ABBYY、Prime Recognition、Omnipage 等,尽管它们需要花钱。
Most OCR engines will handle this situation quite well. In fact OCR engines don't get as confused if there is only one font to recognise on a page. Strange but true in my experience.
If an OCR engine can read your font in the first place then I would just use it and not worry about it. There are better options to pick to improve recognition.
Many OCR engines allow you to set some recognition parameters to help improve recognition such as fixed width or proportional, serif or non-serif, machine or hand print. You can also select a subset of characters such as uppercase or numeric only to improve results considerably. I.e. if you only have numeric characters then the 0 (Zero) character can never get confused with an 'O' or 'o' or 'Ø'. You will find these hints will be more effective than the option of being able to choose the exact fonttype to OCR.
Other engines will allow you to train your OCR engine to deal with new fonts and this will help considerably if you have a strange font.
If your image quality is good and your fonts are clean and of a decent size then I would recommend using Tesseract OCR from Google and OCROpus as suggested by Michael Mior. It is free and works well on clean and clear text. If the text is a little difficult then there are definitely better OCR engines out there such as ABBYY, Prime Recognition, Omnipage and many others although they will cost money.
请查看 OCRopus。它是开源的,并由谷歌赞助:)我不确定它是否允许选择特定的字体,但无论如何它似乎都会产生良好的结果。
Check out OCRopus. It's open-source and sponsored by Google :) I'm not sure if it will allow to pick a particular font, but it seems to produce good results regardless.
它显然仅适用于 Windows,并且主要不专注于 OCR,但 Simba 的 OCR 具有需要了解所使用字体的方法。
请参阅http://docs.villavu.com/simba/scriptref/ocr.html
It's apparently Windows only, and not primarily focused on OCR, but Simba's OCR has methods that require knowledge of the font being used.
See http://docs.villavu.com/simba/scriptref/ocr.html