设计专门用于渲染文本的开源 OCR 引擎(截图)
所以我目前的个人项目是能够自动抓取游戏的屏幕截图,对文本进行 OCR,并计算给定单词出现的次数。
在花了整个晚上研究不同的 OCR 解决方案后,我开始意识到大多数 OCR 软件包都是为扫描文本而设计的。如果有任何软件包可以可靠地读取屏幕文本,那么它们远远超出了该爱好者的预算。
我一直在阅读其他一些问题,我发现最接近的是 OCR 引擎设计用于屏幕阅读。
在我看来,阅读渲染文本应该比打印和扫描文本容易得多。线条始终是直的,并且任何给定的字母将始终以完全相同的像素表示形式出现(无论如何,大多数情况下)。另外,为什么不使用实际的字体文件(如果有的话)作为识别字符的备忘单?使用这样的系统,我们实际上可能达到 100% 的准确率。
假设您有备忘单的字体文件,并且源图像是完美的正方形且没有噪音,您将如何识别屏幕上的字符?
(我可以预见的问题是 ui 线条和图像可能会混淆任何粗略的像素猜测尝试。)
如果您已经知道专为屏幕阅读而设计的免费/开源 OCR 软件包,请告诉我。但我有点怀疑这是否会出现,因为似乎也没有其他提问者获得领先。
Python 接口是首选,但乞丐不能挑剔。
编辑:
为了澄清这一点,我正在寻找专门用于从屏幕截图中读取文本的 OCR 解决方案的设计建议。像 tesseract 这样的流行工具(在我链接的问题中提到)最多很难使用,因为它们不是为这种源文件设计的。
So my current personal project is to be able to automatically grab screenshots out of a game, OCR the text, and count the number of occurrences of given words.
Having spent all evening looking around at different OCR solutions, I've come to realize that the majority of OCR packages out there are designed for scanned text. If there are any packages that can read screen text reliably, they're well outside this hobbyist's budget.
I've been reading through some other questions, and the closest I found was OCR engines designed for screen-reading.
It seems to me that reading rendered text should be much easier than printed and scanned text. Lines are always straight, and any given letter will always appear with the exact same pixel representation (mostly, anyways). Also, why not use the actual font file (if you have it) as a cheat sheet to recognizing characters? We might actually reach 100% accuracy with a system like this.
Assuming you have the font file for a cheat sheet and your source image is perfectly square and has no noise, how would you go about recognizing characters from the screen?
(Problems I can foresee are ui lines and images that could confuse any crude attempt at pixel-guessing.)
If you already know of a free/open-source OCR package designed for screen-reading, please let me know. I kind of doubt that's going to show up though, as no other askers seem to have gotten a lead either.
A Python interface is preferred, but beggars can't be choosers.
EDIT:
To clarify, I'm looking for design suggestions for an OCR solution that is specifically designed to read text from screenshots. Popular tools like tesseract (mentioned in the question I linked) are hard to use at best because they are not designed for this kind of source file.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
所以我一直在考虑这个问题,我觉得最好的方法是计算每个斑点/字形/字符中的像素数。这确实应该减少我需要做的区分字形的测试数量。
遗憾的是,我必须对字体非常具体。该软件只能识别正确 dpi 的字体、正确的字体和粗细等。
这并不理想,我仍然希望看到对这些东西有更多了解的人为渲染文本设计 OCR ;但它适用于我有限的情况。
So I've been thinking about it and I feel that the best approach will be to count the number of pixels in each blob/glyph/character. This should really cut down on the number of tests I need to do to differentiate between glyphs.
Regretfully, I'll have to be very specific about fonts. The software will only be able to recognize fonts at the right dpi, for the right font face and weight, etc.
It isn't ideal, and I'd still like to see someone who knows more about this stuff design OCR for rendered text; but it will work for my limited case.
如果您的目标是计算游戏中某些事件的发生次数,那么 OCR 确实不是正确的方法。也就是说,如果您决定使用 OCR,那么 tesseract-OCR 是一个不错的选择-用于执行光学字符识别的已知开源包。我不太确定您对扫描文本与渲染文本的了解,但 tesseract 可能会像任何可用的开源包一样出色。 OCR 仍然是一门棘手的艺术,所以我不期望 100% 的准确性。
If your goal is to count occurrences of certain events in a game, OCR is really not the right way to be going about it. That said, if you are determined to use OCR, then tesseract-OCR is a well-known open source package for performing optical character recognition. I'm not really sure what you are getting at with respect to scanned vs. rendered text, but tesseract will probably do as good a job as any opensource package that is available. OCR is still a tricky art, so I wouldn't expect 100% accuracy.
这并不完全是您想要的,但您可能想看看 Sikuli。
This isn't exactly what you want, but you may want to look at Sikuli.