查找 JPG 图像中文本的边界框

发布于 2024-11-19 21:18:13 字数 1181 浏览 7 评论 0原文

我的问题与这个问题类似，但范围更具体。

在我的纸牌游戏应用程序中，我希望用户能够单击扫描的 jpeg 图像中的单词。请参阅此示例口袋妖怪交易卡。

在这种情况下，用户应该能够将鼠标悬停在文本“Scratch”上，文本周围将出现一个脉动的矩形边框，表明它是可单击的。问题是如何检测文本的边框。用户可以点击一系列预先已知的单词（这些单词将从数据库中逐张卡片地检索）。继续我们的示例，本例中的数组将为 ["Scratch", "Live Coal"]。一旦用户单击“Scratch”，应用程序必须通过回调知道选择了“Scratch”而不是“Live Coal”。

我正在考虑使用光学字符识别库来解决这个问题，但是开源选项的质量很差（例如 GOCR）和/或没有在多个平台上经过良好的测试（例如 Tesseract）。我只关心 Windows 和 Mac 兼容性。我是否缺少一个不需要 OCR 的明显/更简单的解决方案/算法？我不能简单地在每张卡的边界框中手动编码，因为我的数据库中将有数千张扫描卡。用户还可以上传自己的自定义卡片扫描件以及随附的可点击文本数组。

文本颜色并不总是黑色。请参阅不同卡片和文本的全景将被允许的样式。黑色卡片具有白色文本，倒数第三张卡片（Zekrom）具有带有白色轮廓的黑色文本。

任何编程语言的解决方案都受到赞赏。但是，请注意，我正在寻找开源算法和/或库。如果有 Ruby 或 Java 的解决方案，那就更好了，因为我的代码主要是用这两种语言编写的。

编辑：我忘了提及数组中单词/短语的顺序将与卡片上的顺序相同。因此，数组将是 ["Scratch", "Live Coal"] 而不是 ["Live Coal", "Scratch"]。我提到这一点是因为它有可能简化任务。因此，对于这个例子，我可以简单地寻找黑色像素（尽管我必须注意白色圆圈中的黑色星星）。然而，会出现更困难的情况，即攻击名称下有较小字体的描述性文本（再次参见全景图以获取示例）。

原文

My question is similar to this one, but is more specific in scope.

In my card game application, I would like for users to be able to click on words located in a scanned jpeg image. Please see this sample Pokemon trading card.

In this case, the user should be able to hover his mouse over the text "Scratch", upon which a pulsing rectangular border will appear around the text, indicating that it is clickable. The problem is how to detect the border of the text. There will be an array of words KNOWN BEFOREHAND that the user may click on (these will be retrieved from a database on a card-by-card basis). To continue our example, the array in this case will be ["Scratch", "Live Coal"]. Once the user clicks on "Scratch", the application must know via a call-back that "Scratch" was chosen instead of "Live Coal".

I was thinking of using optical character recognition libraries to solve this problem, but the open-source options for this are poor in quality (e.g. GOCR) and/or not well-tested on multiple platforms (e.g. Tesseract). I only care about Windows and Mac compatibility. Am I missing an obvious/simpler solution/algorithm that does not require OCR? I cannot simply hand-code in bounding boxes for each card, as there will be thousands of scanned cards in my database. The user may also upload his own custom card scans with an accompanying array of clickable text.

Text color is not always black. See this panorama of different card and text styles that will be permitted. The black cards have white text, and the third-to-last card (Zekrom) has black text with a white outline.

Solutions in any programming language are appreciated. However, please note that I am looking for open-source algorithms and/or libraries. If there is a solution in Ruby or Java, even better, as my code is primarily in these two languages.

EDIT: I forgot to mention that the order of the words/phrases in the array will be the same as on the card. Thus, the array will be ["Scratch", "Live Coal"] instead of ["Live Coal", "Scratch"]. I am mentioning this because it can potentially simplify the task. Thus, for this example, I can simply look for black pixels (though I have to watch out for the black star in the white circle). However, there will be more difficult cases where there is descriptive text under the attack name in a smaller font (again, see the panorama for examples).

分享到QQ

分享到微博