查找 JPG 图像中文本的边界框
我的问题与这个问题类似,但范围更具体。
在我的纸牌游戏应用程序中,我希望用户能够单击扫描的 jpeg 图像中的单词。请参阅此示例口袋妖怪交易卡。
在这种情况下,用户应该能够将鼠标悬停在文本“Scratch”上,文本周围将出现一个脉动的矩形边框,表明它是可单击的。问题是如何检测文本的边框。用户可以点击一系列预先已知的单词(这些单词将从数据库中逐张卡片地检索)。继续我们的示例,本例中的数组将为 ["Scratch", "Live Coal"]。一旦用户单击“Scratch”,应用程序必须通过回调知道选择了“Scratch”而不是“Live Coal”。
我正在考虑使用光学字符识别库来解决这个问题,但是开源选项的质量很差(例如 GOCR)和/或没有在多个平台上经过良好的测试(例如 Tesseract)。我只关心 Windows 和 Mac 兼容性。我是否缺少一个不需要 OCR 的明显/更简单的解决方案/算法?我不能简单地在每张卡的边界框中手动编码,因为我的数据库中将有数千张扫描卡。用户还可以上传自己的自定义卡片扫描件以及随附的可点击文本数组。
文本颜色并不总是黑色。请参阅不同卡片和文本的全景将被允许的样式。黑色卡片具有白色文本,倒数第三张卡片(Zekrom)具有带有白色轮廓的黑色文本。
任何编程语言的解决方案都受到赞赏。但是,请注意,我正在寻找开源算法和/或库。如果有 Ruby 或 Java 的解决方案,那就更好了,因为我的代码主要是用这两种语言编写的。
编辑:我忘了提及数组中单词/短语的顺序将与卡片上的顺序相同。因此,数组将是 ["Scratch", "Live Coal"] 而不是 ["Live Coal", "Scratch"]。我提到这一点是因为它有可能简化任务。因此,对于这个例子,我可以简单地寻找黑色像素(尽管我必须注意白色圆圈中的黑色星星)。然而,会出现更困难的情况,即攻击名称下有较小字体的描述性文本(再次参见全景图以获取示例)。
My question is similar to this one, but is more specific in scope.
In my card game application, I would like for users to be able to click on words located in a scanned jpeg image. Please see this sample Pokemon trading card.
In this case, the user should be able to hover his mouse over the text "Scratch", upon which a pulsing rectangular border will appear around the text, indicating that it is clickable. The problem is how to detect the border of the text. There will be an array of words KNOWN BEFOREHAND that the user may click on (these will be retrieved from a database on a card-by-card basis). To continue our example, the array in this case will be ["Scratch", "Live Coal"]. Once the user clicks on "Scratch", the application must know via a call-back that "Scratch" was chosen instead of "Live Coal".
I was thinking of using optical character recognition libraries to solve this problem, but the open-source options for this are poor in quality (e.g. GOCR) and/or not well-tested on multiple platforms (e.g. Tesseract). I only care about Windows and Mac compatibility. Am I missing an obvious/simpler solution/algorithm that does not require OCR? I cannot simply hand-code in bounding boxes for each card, as there will be thousands of scanned cards in my database. The user may also upload his own custom card scans with an accompanying array of clickable text.
Text color is not always black. See this panorama of different card and text styles that will be permitted. The black cards have white text, and the third-to-last card (Zekrom) has black text with a white outline.
Solutions in any programming language are appreciated. However, please note that I am looking for open-source algorithms and/or libraries. If there is a solution in Ruby or Java, even better, as my code is primarily in these two languages.
EDIT: I forgot to mention that the order of the words/phrases in the array will be the same as on the card. Thus, the array will be ["Scratch", "Live Coal"] instead of ["Live Coal", "Scratch"]. I am mentioning this because it can potentially simplify the task. Thus, for this example, I can simply look for black pixels (though I have to watch out for the black star in the white circle). However, there will be more difficult cases where there is descriptive text under the attack name in a smaller font (again, see the panorama for examples).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
为了简单起见,我只想编写一个程序,允许您在文本周围直观地绘制边界框,但可以通过检测像素颜色的差异来做到这一点。由于文本是黑色的,您可以看到左上角最黑色像素的位置,没有大的缩进,并且位于卡片的下半部分内。
I would just write a program that allows you to visually draw a bounding box around your text for simplicity but could could do this buy detecting differences in pixel color. Since the text is black you could see where the upper-left most black pixel is without large indents and within the bottom half of the card.
当光标静止时,检查光标下方或光标周围 4 个像素是否有黑色像素。如果是,请检查光标左侧、右侧、顶部和底部的前三个连续非黑色像素(因为字母之间仍然可能存在非黑色像素)。如果是,请使用这些位置绘制一个正方形。您可以使用OpenCV。
When the cursor is stationary, check if there is a black pixel either underneath or to 4 pixels around the cursor. If it is, check the first three consecutive (because there still might be a non-black pixel between the letters) non-black pixels to the left of the cursor, to the right, to the top and at the bottom. If yes, use these locations to draw a square. You can use OpenCV.