使用Iphone相机识别特定位置的文本

发布于 2024-11-03 10:23:42 字数 230 浏览 10 评论 0原文

我想开发一个应用程序，它应该能够识别计算机打印卡中的一些数字（位于卡的固定位置），然后将它们发送到网络服务。

我知道我应该使用 OCR，但我不确定哪种产品适合我的需求。如果您能向我推荐市场上任何可以帮助我完成这个项目的 api 或产品（开源不是必须的，但会非常受欢迎:)，那就太好了。

除此之外，我还有另一个技术问题：您会在设备中实现 OCR 识别，还是使用网络服务来实现并将图片传递给它？两种模型的优缺点是什么？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

末が日狂欢 2024-11-10 10:23:42

如果您需要定位图像上特定字段的解决方案，那么它不仅仅是 OCR，而且是数据捕获任务。有几种方法可以解决这个问题：根据其他答案中建议的 OCR 输出编写现场检测解决方案，或者使用专门为此设计并提供用于定义布局结构的可视化工具的工具包。

第一种方式需要更多的编程，但在许可方面更便宜。您不仅可以选择商业库，还可以选择开源 OCR 库，例如 Tesseract，它可能并不完美，但通过一些调整和字体训练就足以完成许多任务。

当处理低质量图像时（其中很大一部分是由手机摄像头拍摄的图像），您的现场定位解决方案必须注意图像的某些部分未被识别或错误识别的情况，但仍然能够定位您的现场想。您可能还需要交叉检查多个识别变体以提供合理的组合。

这并不是一件小事，需要一些时间才能使其可靠地工作。但仍然可行，只要您没有非常复杂的文档并且只有一种布局并且非常可预测。一旦您拥有代码，就可以在服务器和手机上运行。

如果您正在寻找更复杂的文档和各种布局变体，则在纯代码中管理此逻辑可能会变得太困难。在这种情况下，最好寻找更先进的数据捕获技术。市面上有很多 Data Captrue 产品，但我只知道其中一种以 API 形式提供： http://www.abbyy.com/flexicapture_engine/

它有两个组件。一种是创建和调试文档描述的可视化工具。您只需描述文档中字段位置的逻辑，技术就会处理其余的事情：对不同的变体进行投票，处理重新输入中的错误等等。您可以定义多种替代文档结构和规则，以检查一个值是否与文档布局中的另一个值相对应。这些规则还将影响最佳识别变体的选择。

第二个组件实际上是API。您只需将其插入您的应用程序并加载文档模板描述。在移动识别场景中，它只能用作服务器后端处理，因为它太强大且笨重，不适合移动设备。然而，好的一面是您不必将其移植到每个移动操作系统，它使用全功能的 OCR 技术，而不是适合移动资源的受限技术。该工具包确实包含一些先进的图像处理技术，使其能够更好地处理手机捕获的图像。

免责声明：我为 ABBYY 工作。

If you need solution that locates specific fields on an image, then it is not just OCR, but a Data Capture task. There are several approaches how to solve it: write your of field detection solution based on OCR output like was suggested in other answer, or use toolkit that is specially designed for that and offers visual tools for defining layout structure.

First way requires more programming but is cheaper in terms of licensing. You can choose not only commertial but also open source OCR libraries like Tesseract, which maynot perfect but with some tweaking and font training can by good enough for many tasks.

When dealing with low quality images (and images taken by phone camera will have significant portion of those) your field location solution will have to take care about cases when some parts of images were not recognized or wrongly recognized and still be able to locate fields you want. You may also want to cross-check several recognition variants to provide reasonable combinations.

This is not trivial and will require some time to get it work reliable. But still doable, provided you have not very complicated documents and there is just one layout and it is very predictable. And once you own the code, this can be run both on the server and the phone.

If you are looking for little bit more complex documents and variety of layout variants, mantaing this logic in pure code can become too difficult. In this case it is better to look for more advanced Data Capture technologies. There is quite a number of Data Captrue products out there, but I know just one that is offered in the form of API: http://www.abbyy.com/flexicapture_engine/

It has two components. One is visual tool to create and debug document description. You just describe logic of the field location on the document, and technology takes care about the rest: voting about different variants, taking care about mistakes in recignition and so on. You can define several alternative document structures and rules to check if one value do correspont to another in the document layout. Those rules will also influence selecting best recognition variants.

Second component is actually API. You just plug it into your application and load document template description. In mobile recognition scenario it can only be used as server back-end processing, since it is too powerful and heavy to fit into mobile. However, the bright side of that is that you don't have to port it to every mobile OS, it uses full-funcitonal OCR technology as opposed to restricted ones that fit to mobile resource. This toolkit does include some advanced image processing technologies that make it work better on images captred by the phone.

Disclaimer: I work for ABBYY.

回复收藏 0 原文

~没有更多了~