在VB.Net中将图像中的图片解析为文本
我只是想知道 VB.Net 2008 中是否有任何 DLL 或功能可以用来将文本图片解析为文本(例如,屏幕截图),假设文本采用非常容易识别的格式(即,不像验证码)文本类型)。
I am just wondering if there is any DLLs or features in VB.Net 2008 that I could use to parse a picture of text to text (for example, a screenshot), assuming the text are in very recognizable format (i.e., not like CAPTCHA type of text).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果它是令人难以置信可读的,未更改的,纯粹的屏幕截图,那么最简单(但可能是最慢)的方法是绘制每个字母(使用
Graphics.DrawString< /code>) 到位图上并逐个像素地与每个像素进行比较。考虑到 OCR 的速度,这可能相当快,而且几乎可以肯定会给出 100% 的准确率。更好的是,如果您尝试识别特定区域中的文本,缩小搜索区域并提高速度数倍,如果文本采用固定宽度格式并且您知道字体大小或可以计算它,那就更好了通过搜索一个小区域 - 当识别出一个字母时,您可以跳过整个块!
如果您不知道如何进行此类图像处理,也没关系。首先查看 MSDN 上的
GetPixel
和SetPixel
,然后转到速度部分并查找使用LockBits
的示例。If it is incredibly readable, an unaltered, pure, screenshot, then the easiest (but probably slowest) way is to draw each letter (using
Graphics.DrawString
) on to a bitmap and compare that, pixel by pixel, against each pixel. This could be reasonably quick considering how OCR is, and it would almost certainly give a 100% accuracy rate. Even better would be if you're trying to recognize text in a certain area, reducing the search area and increasing speed several times, and even better if the text is in a fixed-width format and you know the font size or can figure it out by searching a small area - you can skip the entire block when a letter is recognized!If you don't know how to do this type of image manipulation, that's OK. Look at
GetPixel
andSetPixel
on MSDN to start out, then move on to the speed section and look for examples usingLockBits
.到目前为止,您最好的选择是购买一些 OCR 软件来为您完成此操作。这是另一种选择,尽管您必须等待:
http://www. labnol.org/software/convert-scanned-pdf-images-to-text-with-google-ocr/5158/
By far and away your best bet on this one is to buy some OCR software to do it for you. Here's another option, although you'll have to wait:
http://www.labnol.org/software/convert-scanned-pdf-images-to-text-with-google-ocr/5158/