如何用PHP从屏幕截图中获取字幕?
我从电影截图中抓取字幕。 一个例子
它将抓取
嘿,我们为什么不放松一下呢?
与字幕无关。是截图。由于它是一个副标题,我们知道字体大小等,如果这会让它更容易抓取。
我知道你们大多数人都会说 PHP OCR 库,但由于背景总是不同,看起来它不起作用。
I grab subtitle from movie screenshot.
An example
It will grab
Hey, why don't we all just relax, huh?
It has no relation with subtitle. It is screenshot. Since it is a subtitle we know the font type size etc if this will make it easier to grab.
I know most of you will say PHP OCR library but since the background is always different, it looks like it won't work.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
背景不同应该不是问题,您可以使用图像库来删除任何不是文本颜色的内容。
这是一个简单的示例,可以很好地理解我的意思,它将任何低于
#f5f5f5
的颜色替换为#000000
,如下:
结果 顶部的大部分内容都已关闭,因为您知道字幕将位于底部。然后只需通过 OCR 库运行它即可。
最好使用外部 OCR 库或命令行工具并从 PHP 调用它。对于外部工具,有 tesseract 和 ocropus (我相信 ocropus 也是由 Google 赞助的)。
The background being different shouldn't be a problem, you can just use an image library to remove anything that isn't the text colour.
Here's a quick example that gives a decent idea of what I mean, it replaces any colour lower than
#f5f5f5
with#000000
,Here's how the result looks:
You can probably chop most of the top part off since you know the subtitles will be at the bottom. Then just run it through an OCR library.
It's probably better to use an external OCR library or command line tool and call it from PHP. For external tools, there's tesseract and ocropus (I believe ocropus is sponsored by Google too).