Tesseract 和 Php ocr
我希望使用 Tesseract 将大量图像文件转换为文本。
我查看了他们的文档,但不知道这与 PHP 有何关系,以及我的 php 脚本如何与 tesseract ocr 交互。我在其他问题上看到表明 php exec() 可能是这样。
$img = myimage.png;
$text = exec($img,'tesseract');
我已经下载并安装了 tesseract。 使用安装了最新版本 xampp 的 Windows 7。 我有初级到中级的 php 知识。 我缺少什么知识?
更新 我现在可以在 powershell 和 cmd 中使用它,
tesseract.exe D:\Documents\Web_Development\Sandbox\php\images\23.png D:\Documents\Web_Development\Sandbox\php\images\23
但是当我尝试像这样通过 exec 运行它时:
<?php
exec('tesseract.exe D:\Documents\Web_Development\Sandbox\images\23.png D:\Documents\Web_Development\Sandbox\images\23');
?>
我从 Windows 中收到一个弹出窗口,显示 tesseract.exe 已停止工作。以下是错误详细信息(如果它们对任何人都有意义的话)。
Problem signature:
Problem Event Name: BEX
Application Name: tesseract.exe
Application Version: 0.0.0.0
Application Timestamp: 4ca507b3
Fault Module Name: MSVCR90.dll
Fault Module Version: 9.0.30729.4926
Fault Module Timestamp: 4a1743c1
Exception Offset: 0002f93e
Exception Code: c0000417
Exception Data: 00000000
OS Version: 6.1.7600.2.0.0.768.3
Locale ID: 1033
Additional Information 1: e958
Additional Information 2: e95831f9d00a16a326250da660e931c5
Additional Information 3: 040a
Additional Information 4: 040a259d27c5ccf749ee18722d5fbec0
I am looking to convert a large number of image files into text using Tesseract.
I have looked at their documentation but have not idea how that relates to PHP and how my php script will interact with tesseract ocr. I have seen on other questions that suggest that php exec() might be the way.
$img = myimage.png;
$text = exec($img,'tesseract');
I have downloaded and installed tesseract.
Using windows 7 with a recent version of xampp installed.
I have a beginner to intermediate knowledge of php.
What knowledge am I missing?
Update I now have it working with in powershell and cmd with
tesseract.exe D:\Documents\Web_Development\Sandbox\php\images\23.png D:\Documents\Web_Development\Sandbox\php\images\23
But When I try to run it through exec like this:
<?php
exec('tesseract.exe D:\Documents\Web_Development\Sandbox\images\23.png D:\Documents\Web_Development\Sandbox\images\23');
?>
I get a popup from windows that says the tesseract.exe has stopped working. here are the error details if they mean anything to anyone.
Problem signature:
Problem Event Name: BEX
Application Name: tesseract.exe
Application Version: 0.0.0.0
Application Timestamp: 4ca507b3
Fault Module Name: MSVCR90.dll
Fault Module Version: 9.0.30729.4926
Fault Module Timestamp: 4a1743c1
Exception Offset: 0002f93e
Exception Code: c0000417
Exception Data: 00000000
OS Version: 6.1.7600.2.0.0.768.3
Locale ID: 1033
Additional Information 1: e958
Additional Information 2: e95831f9d00a16a326250da660e931c5
Additional Information 3: 040a
Additional Information 4: 040a259d27c5ccf749ee18722d5fbec0
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您应该尝试在没有 PHP 的情况下让它工作,即从 ms windows CLI 界面(ms-dos 提示符)。之后,您只需将在 CLI 中输入的任何内容放入 PHP 运行时,通过 CLI 或其他一些 IPC 机制,最终用 PHP 变量对其进行参数化。
例如,如果在 CLI 中您要输入
以获取系统的 IP 配置,那么在 PHP 中您只需使用:
回到您的问题,如果在 CLI 中您要发出:
那么在 PHP 中您将就
这样吧。它并不特定于 tesseract,它适用于任何程序(带有 CLI 界面)。
如果您需要对输出或输入进行更多控制(因为在程序运行时要求用户输入的情况),您应该使用
proc_*()
系列函数http://ch2.php.net/manual/en/function.exec.php祝你好运!
You should try to get it working without PHP, that is, to run it from the ms windows CLI interface (the ms-dos prompt). After that, you simply put whatever you have typed in the CLI in the PHP runtime, running it via CLI or some other IPC mechanisms, eventually parameterizing it with PHP variables.
For example, if in the CLI you would be typing
to get the IP configuration of the system, then in PHP you'd simply use:
Back to your problem, if in the CLI you'd be issuing:
Then in PHP you'd do
That's about it. It's not specific to tesseract, it works with any program (with a CLI interface).
If you need more control over the output, or the input (as it's the case when the user is asked for input while the program is running), you should use the
proc_*()
family of functions from http://ch2.php.net/manual/en/function.exec.phpGood luck!