Tesseract 和 Php ocr

发布于 2024-10-12 13:10:41 字数 1385 浏览 2 评论 0原文

我希望使用 Tesseract 将大量图像文件转换为文本。

我查看了他们的文档,但不知道这与 PHP 有何关系,以及我的 php 脚本如何与 tesseract ocr 交互。我在其他问题上看到表明 php exec() 可能是这样。

$img = myimage.png;
$text = exec($img,'tesseract');

我已经下载并安装了 tesseract。 使用安装了最新版本 xampp 的 Windows 7。 我有初级到中级的 php 知识。 我缺少什么知识?

更新 我现在可以在 powershell 和 cmd 中使用它,

tesseract.exe D:\Documents\Web_Development\Sandbox\php\images\23.png D:\Documents\Web_Development\Sandbox\php\images\23

但是当我尝试像这样通过 exec 运行它时:

<?php 
exec('tesseract.exe D:\Documents\Web_Development\Sandbox\images\23.png D:\Documents\Web_Development\Sandbox\images\23');
?>

我从 Windows 中收到一个弹出窗口,显示 tesseract.exe 已停止工作。以下是错误详细信息(如果它们对任何人都有意义的话)。

Problem signature:
  Problem Event Name:   BEX
  Application Name: tesseract.exe
  Application Version:  0.0.0.0
  Application Timestamp:    4ca507b3
  Fault Module Name:    MSVCR90.dll
  Fault Module Version: 9.0.30729.4926
  Fault Module Timestamp:   4a1743c1
  Exception Offset: 0002f93e
  Exception Code:   c0000417
  Exception Data:   00000000
  OS Version:   6.1.7600.2.0.0.768.3
  Locale ID:    1033
  Additional Information 1: e958
  Additional Information 2: e95831f9d00a16a326250da660e931c5
  Additional Information 3: 040a
  Additional Information 4: 040a259d27c5ccf749ee18722d5fbec0

I am looking to convert a large number of image files into text using Tesseract.

I have looked at their documentation but have not idea how that relates to PHP and how my php script will interact with tesseract ocr. I have seen on other questions that suggest that php exec() might be the way.

$img = myimage.png;
$text = exec($img,'tesseract');

I have downloaded and installed tesseract.
Using windows 7 with a recent version of xampp installed.
I have a beginner to intermediate knowledge of php.
What knowledge am I missing?

Update I now have it working with in powershell and cmd with

tesseract.exe D:\Documents\Web_Development\Sandbox\php\images\23.png D:\Documents\Web_Development\Sandbox\php\images\23

But When I try to run it through exec like this:

<?php 
exec('tesseract.exe D:\Documents\Web_Development\Sandbox\images\23.png D:\Documents\Web_Development\Sandbox\images\23');
?>

I get a popup from windows that says the tesseract.exe has stopped working. here are the error details if they mean anything to anyone.

Problem signature:
  Problem Event Name:   BEX
  Application Name: tesseract.exe
  Application Version:  0.0.0.0
  Application Timestamp:    4ca507b3
  Fault Module Name:    MSVCR90.dll
  Fault Module Version: 9.0.30729.4926
  Fault Module Timestamp:   4a1743c1
  Exception Offset: 0002f93e
  Exception Code:   c0000417
  Exception Data:   00000000
  OS Version:   6.1.7600.2.0.0.768.3
  Locale ID:    1033
  Additional Information 1: e958
  Additional Information 2: e95831f9d00a16a326250da660e931c5
  Additional Information 3: 040a
  Additional Information 4: 040a259d27c5ccf749ee18722d5fbec0

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

叹梦 2024-10-19 13:10:41

您应该尝试在没有 PHP 的情况下让它工作,即从 ms windows CLI 界面(ms-dos 提示符)。之后,您只需将在 CLI 中输入的任何内容放入 PHP 运行时,通过 CLI 或其他一些 IPC 机制,最终用 PHP 变量对其进行参数化。

例如,如果在 CLI 中您要输入

ipconfig /all

以获取系统的 IP 配置,那么在 PHP 中您只需使用:

<?php
echo '<pre>';
echo exec('ipconfig /all');
echo '</pre>';

回到您的问题,如果在 CLI 中您要发出:

tesseract document.tif result

那么在 PHP 中您将就

<?php
echo '<pre>';
echo exec('tesseract document.tif result');
echo '</pre>';

这样吧。它并不特定于 tesseract,它适用于任何程序(带有 CLI 界面)。

如果您需要对输出或输入进行更多控制(因为在程序运行时要求用户输入的情况),您应该使用 proc_*() 系列函数http://ch2.php.net/manual/en/function.exec.php祝

你好运!

You should try to get it working without PHP, that is, to run it from the ms windows CLI interface (the ms-dos prompt). After that, you simply put whatever you have typed in the CLI in the PHP runtime, running it via CLI or some other IPC mechanisms, eventually parameterizing it with PHP variables.

For example, if in the CLI you would be typing

ipconfig /all

to get the IP configuration of the system, then in PHP you'd simply use:

<?php
echo '<pre>';
echo exec('ipconfig /all');
echo '</pre>';

Back to your problem, if in the CLI you'd be issuing:

tesseract document.tif result

Then in PHP you'd do

<?php
echo '<pre>';
echo exec('tesseract document.tif result');
echo '</pre>';

That's about it. It's not specific to tesseract, it works with any program (with a CLI interface).

If you need more control over the output, or the input (as it's the case when the user is asked for input while the program is running), you should use the proc_*() family of functions from http://ch2.php.net/manual/en/function.exec.php

Good luck!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文