c# OCR无法识别数字(tesseract 2)
它失败了,我得到一个 ~ 作为回报。我正在使用 google 的 tesseract 2,使用 C#(开源 c# 包装器),现在我想知道,这个图像是否太糟糕而不能用于 OCR?
因为恕我直言,数字很清楚。
您是否有其他 OCR 引擎可以解决这个问题?
编辑
我也尝试过Asprise OCR (http://asprise.com/product/ocr/selector.php),但它也无法解析图像......
I'm trying to extract digits from the following:
It fails, I get a ~ in return. I'm using google's tesseract 2, using C# (open source c# wrapper) and now I'm wondering, is this image too crappy to be used for OCR?
Because imho the digits are straight clear.
Do you have any other OCR engine in mind that would nail this down?
EDIT
I've also tried with Asprise OCR (http://asprise.com/product/ocr/selector.php) but it fails to parse the image too...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我建议调整大小。我在 IE 中将此页面缩放到 200%,截取屏幕截图,将其打印为 PDF 并将其导入到我使用 tessnet 的程序中。苔丝成功了!除非我读错了 #s :-)
尽管置信度 = 140(如果您想知道的话,最好低于 100)。当然,当我尝试原始尺寸时,我没有得到〜;我得到了大约 1/2 的#,一堆字母和其他垃圾。不够好,但更好。
t2 似乎喜欢特定尺寸的图像。
我的程序进行处理以使其正常工作。建议使用 .net GDI+ 转换为 32 位,并使用插值模式高质量双三次调整大小。这似乎有点“填补空白”。
尝试使用合适的尺寸 - 我发现太大或太小,超正方体的表现都会有所不同。
这两个问题都是预处理,这很简单,你会尝试 tesseract;但是,我知道如何调整大小和插值;我不知道怎么OCR!所以我愿意和解。
I suggest resizing. I zoomed this page to 200% in IE, Took a screenshot, printed it to PDF and imported it into my program that uses tessnet. Tess nailed it! Unless I read the #s wrong :-)
Although confidence = 140 (under 100 is preferred if you wondered). Of course When i tried the original size, I didn't get ~; I got about 1/2 the #s right, a bunch of letters, and other garbage. Not good enough, but better.
t2 seems to like images a certain size.
My program does processing to get that to work. Suggest using .net GDI+ for converting to 32 bit, resizing with Interpolation mode High Quality Bicubic. This seems to 'fill in the gaps' a bit.
Play with sizes that work - I have found, too big, or too small, and tesseract performs differently.
Both issues are preprocessing, that's easy and you'd thing tesseract would try; however, I know how to resize and interpolate; I don't know how to OCR! So I am willing to settle.
您的图像分辨率太低 - 96 DPI,也许它是一个屏幕截图。将其重新调整为 300 DPI,tessnet2 应该能够识别它。
Your image's resolution is too low -- 96 DPI, perhaps it is a screenshot. Rescale it to 300 DPI, and tessnet2 should be able to recognize it.