从 TIFF 中提取或裁剪图像
我需要从 TIFF 文件中提取/裁剪中间的标识 (BEAVER),如下所示: http: //i41.tinypic.com/2i7rbie.jpg
然后我需要自动化该过程,以便可以重复大约 900 万次...
我的猜测是我必须使用一些 OCR 软件。但是这样的软件是否有可能“裁剪从该点以下开始并在该点以上结束的任何内容”?
想法?
I need to extract/crop the logotype (BEAVER) in the middle from a TIFF file that looks like this: http://i41.tinypic.com/2i7rbie.jpg
And then I need to automate the process so it can be repeated about 9 million times...
My guess is that I would have to use some OCR software. But is it possible for such a software to "crop anything that starts below this point and ends above this point"?
Thoughts?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
通常 OCR 软件仅从图像中提取文本并将其转换为某种特定于文本的格式。它不做作物。但是,您可以使用 OCR 技术来完成您的任务。我建议如下:
真正的挑战在于您想要处理的文本量。在定义“智能规则”时,您必须非常小心,以确保它们不会提供误报,并始终将可疑图像发送到单独的队列,您稍后将手动检查和更新您的规则。
一般来说,它可能看起来像这样:
您很可能会遇到一些奇怪的图像,这些图像要么与现有规则相矛盾,要么就是错误的。您并不总是需要更新规则来适应它。可能你的 900 万张收藏中只有几十张这样的图像。最好将它们留在异常队列中以进行手动处理,并且不要冒着魔法规则稳定性的风险。
Typically OCR software does only extraction of text from images and conversion of it into some text-specific format. It does not do crop. However, you can use OCR technologies to achieve your task. I would recommend following:
Real challenge is in the amount of text you would like to process. You have to be very carefull when defining your "smart rules" to make sure they don't provide false positives and always send suspicious images to separate queue that you will later manually review and update your rules.
In general it may look like this:
Most likely you will encounter some strange images that either contradict existing rules, or just wrong. Not always you have to update your rules to accomodate it. It may happen that there it only dozen of images like that in whole your 9 million collection. It might be better to leave them in exceptions queue for manual processing, and don't risk stability of your magic rules.