通过命令行调用与 JNI 调用
我需要从 Java 应用程序服务器调用 tesseract OCR(它是一个 C++ 开源库,可进行光学字符识别)。 现在使用 Runtime.exec() 运行可执行文件已经很容易了。 基本逻辑是将
- 当前保存在内存中的图像保存到文件(.tif)
- 中,将图像文件名传递给 tesseract 命令行程序。
- 使用 FileReader 从 Java 读取输出文本文件。
通过为 Tesseract 编写 JNI 包装器,我可能会在性能方面获得多少改进? 不幸的是,没有一个可以在 Linux 中工作的开源 JNI 包装器。 我必须自己做,并且想知道这种好处是否值得开发成本。
I need to invoke tesseract OCR (its an open source library in C++ that does Optical Character Recognition) from a Java Application Server. Right now its easy enough to run the executable using Runtime.exec(). The basic logic would be
- Save image that is currently held in memory to file (a .tif)
- pass in the image file name to the tesseract command line program.
- read in the output text file from Java using FileReader.
How much improvement in terms of performance am I likely to get by writing a JNI wrapper for Tesseract? Unfortunately there is not an open source JNI wrapper that works in Linux. I would have to do it myself and am wondering about whether the benefit is worth the development cost.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
很难说这是否值得。 如果您假设通过 JNI 在进程内完成,OCR 代码可以直接访问图像数据而无需将其写入文件,那么它肯定会消除那里的任何磁盘 I/O 限制。
我建议采用更简单的方法,并且仅在性能不可接受时才采用 JNI 选项。 至少这样您将能够进行一些基准测试并估计您可能实现的性能提升。
It's hard to say whether it would be worth it. If you assume that if done in-process via JNI, the OCR code can directly access the image data without having to write it to a file, then it would certainly eliminate any disk I/O constraints there.
I'd recommend going with the simpler approach and only undertaking the JNI option if performance is not acceptable. At least then you'll be able to do some benchmarking and estimate the performance gains you might be able to realize.
如果您确实追求自己的包装器,我建议您查看 JNA。 它将允许您调用大多数仅编写 Java 代码的“本机”库,并且会为您提供比原始 JNI 更多的帮助来安全地执行此操作。 JNA 适用于大多数平台。
If you do pursue your own wrapper, I recommend you check out JNA. It will allow you to call most "native" libraries writing only Java code, and will give you more help than does raw JNI to do it safely. JNA is available for most platforms.
我同意tweakt的观点。 如果没有性能原因,请勿使用 JNI。 如果您使用 JNI 调用,如果您的 JNI 层或 OCR 本身存在内存泄漏甚至崩溃的可能性,那么您的应用程序稳定性也可能面临危险。 如果您通过命令行界面使用它,则永远不会发生这种情况(所有内存将在程序退出时释放,并且可以在调用者代码中检查所有异常程序终止)。
I'm agree with tweakt. Do not use JNI if there is no perfomance reasons to do this. Your application stability is also could be in danger if you use JNI calls if there will be some possibilities of memory leaks or even crashes in your JNI layer or in OCR itself. This will never happen if you use it via command line interface (All memory will be released at the program exit and all abnormal program terminations can be checked in the caller code).