.NET C# 中的简单 OCR 问题
我正在做一些 OCR 工作和屏幕抓取。我最终得到了很多类似这样的文件。
我需要做的就是用 C# 对这些文件进行一些非常基本的 OCR。我一直在努力尝试让不同的库(Tessnet2、Puma、MODI)工作,并且在让它们甚至从 C# 中运行时遇到了很多不同的问题。
对于这么简单的事情,你们有什么推荐吗?
谢谢!
I am doing some OCR stuff and screen scraping. I end up with lots of files that look like this.
All I need to do is some very basic OCR in C# on these files. I've been pulling my hair trying to get different libraries to work (Tessnet2, Puma, MODI) and have been having lots of different problems getting them to even run from within C#.
What do you guys recommend for something this simple?
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
OCR 程序不适用于读取低分辨率屏幕截图。即使是一些最好的商业 OCR 引擎也无法读取屏幕截图。
即使在正常情况下,Tesseract 也需要良好的干净图像才能获得不错的结果。导致结果不佳的原因可能有多种。如果您发布一些示例图像和输出结果,那么我们可能能够更好地解释结果。问题包括彩色背景、文本分区错误、小字符、人工制品……
显然,如果你使用你想要阅读的字体来训练 Tesseract,它会得到更好的结果。
OCR programs are not designed to read low resolution screen shots. Even some of the best best commercial OCR engines have trouble reading screen shots.
Tesseract needs good clean images even under normal circumstances to get decent results. There could be a couple of reasons why you are getting poor results. If you post some sample images and the output results then we may be better able to explain the results. Problems include colored backgrounds, text zoning errors, small characters, artefacts ....
Apparently Tesseract will get much better results if you train it using the fonts that you want to read.
您可以尝试一个基于 Web 的 OCR API,以下是如何使用它的 C# 示例:http://snipt .org/lOgh/(您首先需要在 http:// /www.wisetrend.com/wisetrend_ocr_cloud.shtml - 查找“免费注册”按钮)。
免责声明:WiseTrend 是我公司的客户。
There's a web-based API for OCR that you can try, here's a C# example of how to use it: http://snipt.org/lOgh/ (you'll first need to register for an API key at http://www.wisetrend.com/wisetrend_ocr_cloud.shtml - look for the "Sign Up Free" button).
Disclaimer: WiseTrend is my company's customer.