我如何在小型 c++ 中使用 tesseract ocr（或任何其他免费 ocr）项目？

发布于 2024-10-18 11:52:15 字数 558 浏览 1 评论 0原文

因此，我经过研究后得知，唯一可靠的免费 OCR 选项是 Tesseract 或库内形式。

现在，Tesseract 文档简直太糟糕了，他们给你的只是一堆 Visual Studio 代码（对我来说在 Windows 上），从那里你就只能在他们的 API 海洋中独自一人了。您所能做的就是使用编译后的 exe，然后在 tiff 图像上使用它。

我期待至少有简短的文档告诉您如何调用他们的 API 调用以至少在一个小示例中使用 OCR，但没有，他们的文档中没有类似的内容。

CuneiForm：我下载了它，“太棒了”，一切都是俄语的。 :(

对于那些人来说，举一个小例子真的很难吗？相反，他们向我们提供了一堆可能 90% 的人都无法接触到的不相关的信息，你如何能在不从小事开始的情况下到达那里，而他们却没有解释任何事情！

所以我有一堆API，但是如果没有任何解释，我到底应该如何使用它？...也许有人可以给我提供建议和解决方案？我不是在寻求奇迹，只是向我展示一些小东西事情如何运作。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

无风消散 2024-10-25 11:52:15

你可能已经放弃了，但可能还有其他人仍在尝试。因此，这是您开始使用 tesseract 所需的内容：

首先，您应该阅读有关 tesseract 的所有文档。您可能会在 wiki 中找到一些有用的内容。

要开始使用 API（v 3.0.1，当前在主干中），另请阅读 trunk），您应该查看 baseapi.h。如何使用 api 的文档就在那里，每个函数上方都有注释。

对于初学者：

包括 baseapi.h &构造 TessBaseAPI 对象
调用 Init()
一些可选的，比如
- 使用 SetVariable() 函数更改一些参数。如果使用 PrintVariables() 函数将所有参数及其值打印到文件中，您就可以看到它们。
- 使用SetPageSegMode()更改分段模式。告诉 tesseract 您要 OCR 的图像代表什么 - 文本块或行、单词或字符。
SetImage()
GetUTF8Text()

（同样，这仅适用于初学者。）

您可以检查 tesseract 的社区以获取已回答的问题或提出您自己的此处。

回复收藏 0 原文

好久不见√ 2024-10-25 11:52:15

我正在深入研究它..到目前为止我已经为其生成了 DoxyGen 代码..这很有帮助。尽管如此，仍在阅读所有文档。

一些对我有帮助的链接：

Google 开发小组充满了来自绝望开发者的损坏示例
一个稍微旧的（v2.0）黑客tesseract如何

任何我从谷歌下载的svn代码：http://code.google.com/p/tesseract-ocr/

制作并安装它，然后使用 doxygen 生成我自己的 API 参考文档。非常有用。

我这样做的方式是：

我使用“make install”并将一些内容放入 /usr/include/tesseract
我将该目录复制到我的主目录
doxygen -g doxygen.conf; # 生成 doxygen 文件
浏览它生成的文件并设置输出目录和项目名称或其他内容。我使用“doxy-dox”作为输出目录
doxygen -g doxygen.conf
chromium-browser chromium-browser doxy-doc/html/index.html

希望能有所帮助。

回复收藏 0 原文

酒中人 2024-10-25 11:52:15

Marko，我尝试使用 Tesseract 编写一个快速的 C++ 应用程序，但遇到了同样的问题。

简而言之，我发现它与一些小示例/文档令人困惑，但我并没有对产品提出过错，哎呀，它是开源的，贡献者可能对改进它比营销更感兴趣。

您可以尝试查看源代码，可能花时间可能会得到理解，但我完全可以理解您的挫败感。

祝你好运！

回复收藏 0 原文

↘人皮目录ツ 2024-10-25 11:52:15

我发现，如果您使用的是 Visual Studios 2010 并使用 Windows 窗体/设计器，您可以通过这种方式轻松添加它，没有任何问题

将以下项目添加到您的项目中（我警告您一次，不要添加tesseract 解决方案，或更改您添加的项目中的任何设置，除非您爱恨自己）
ccmain
结构体
库蒂尔
分类
立方体
库蒂尔
词典
图像
自由正方体
自然网络
文本索德
观众
wordrec

您可以添加其他内容，但您并不真的希望将所有内容内置到您的项目中，对吗？ naaa，分别构建这些

转到您的项目属性并添加 libtesseract 作为参考，您现在可以将其作为项目可见，这将使您的项目快速构建，而无需检查 tesseract 中的数百万个警告。 [通用属性]->[添加引用]
在解决方案资源管理器中右键单击您的项目，然后单击项目依赖项，确保它依赖于 libtesseract 甚至全部，这只是意味着它们在您的项目之前构建。< /p>
tesseract 2010 Visual Studio 项目包含许多配置设置，又名release、release.dll、debug、debug.dll，似乎release.dll 设置生成了正确的文件。首先，将解决方案输出设置为release.dll。单击您的项目属性。然后单击配置管理器。如果该选项不可用，请执行此操作，单击解决方案树中的解决方案属性，然后单击配置选项卡，您将看到项目列表和关联的配置设置。您会注意到您的项目未设置为release.dll，即使输出是。如果您选择第二条路线，您仍然需要单击配置管理器。然后您可以编辑设置，在项目设置上单击“新建”并将其命名为release.dll...与其余部分完全相同，然后从release 复制设置。对“调试”执行相同的操作，以便获得从调试设置复制的 debug.dll 名称。哇…快完成了
不要尝试更改超正方体设置以匹配您的设置...这不会起作用...并且当新版本发布时，您将无法“将其扔进去”并去。接受这样一个事实：在这种状态下，您的新模式是 Release.dll 和 Debug.dll。不要紧张...您可以在完成后返回并从您的解决方案中删除项目。
猜猜库和 dll 是从哪里出来的？在您的项目中，您可能需要也可能不需要添加库目录。有些人说将所有标头转储到一个文件夹中，这样他们只需将一个文件夹添加到包含中，但我不需要。我希望能够删除 tesseract 文件夹并从 zip 中重新加载它，而无需额外的工作......并且完全准备好一次性更新或在我弄乱了代码时恢复它。这需要一些工作，您可以使用代码而不是设置来完成它，这就是我的做法，但是您应该将所有包含头文件的文件夹包含在 2010 tesseract 项目文件夹中，并保留它们。
无需向您的项目添加任何文件。只是这些代码行......我已经包含了一些额外的代码，可以将一个外部数据集转换为 tiff 友好版本，无需保存/加载文件。我不是很好吗？
现在您可以在debug.dll和release.dll中进行完全调试，一旦您成功地将其构建到您的项目中，即使您可以删除所有添加的项目，它也会变得完美。没有额外的编译或错误。完全可调试，一切自然。
如果我没记错的话，我无法回避这样的事实：我必须将 2008/lib/ 中的文件复制到我的项目发布文件夹中......该死。

在我的项目“functions.h”中，我将其放入

#pragma comment (lib, "liblept.lib" )
#define _USE_TESSERACT_
#ifdef _USE_TESSERACT_
#pragma comment (lib, "libtesseract.lib" )
#include <baseapi.h>
#endif
#include <allheaders.h>

主项目中，将其作为成员放入类中：

tesseract::TessBaseAPI *readSomeNombers;

当然，我在某处包含“functions.h”

，然后将其放入类构造函数中：

readSomeNombers = new tesseract::TessBaseAPI();
readSomeNombers ->Init(NULL, "eng" );
readSomeNombers ->SetVariable( "tessedit_char_whitelist", "0123456789,." );

然后我创建了这个类成员函数：还有一个类成员作为输出，不要讨厌，我不喜欢返回变量。不是我的风格。当在成员函数中使用这种方式时，像素的内存不需要被销毁，我相信并且我的测试表明这是调用这些函数的安全方法。但无论如何，你可以做任何事。

void Gaara::scanTheSpot()
{
    Pix *someNewPix;
    char* outText;
    ostringstream tempStream;
    RECT tempRect;
    someNewPix = pixCreate( 200 , 40 , 32 );
    convertEasyBmpToPix( &scanImage, someNewPix, 87, 42 );

    readSomeNombers ->SetImage(someNewPix);
    outText = readSomeNombers ->GetUTF8Text();
    tempStream.str("");
    tempStream << outText;
    classMemeberVariable = tempStream.str();
//pixWrite( "test.bmp", someNewPix, IFF_BMP );
}

具有我要扫描的信息的对象位于内存中，并由 &scanImage 指向。它来自“EasyBMP”库，但这并不重要。

我在“functions.h”/“functions.cpp”中的函数中处理它
顺便说一句，当我在循环中时，我在这里做了一些额外的处理，即细化字符并使其变为黑白以及反转黑白，这是不必要的。在我发展的这个阶段，我仍在寻找提高认知度的方法。尽管对于我的建议来说，这还没有产生不良数据。为了简单起见，我的观点是使用默认的 Tess 数据。我正在试探性地解决一个非常复杂的问题。

void convertEasyBmpToPix( BMP *sourceImage, PIX *outputImage, unsigned startX, unsigned startY )
{
    int endX = startX + ( pixGetWidth( outputImage ) );
    int endY = startY + ( pixGetHeight( outputImage ) );
    unsigned destinationX;
    unsigned destinationY = 0;
    for( int yLoop = startY; yLoop < endY; yLoop++ )
    {
        destinationX = 0;
        for( int xLoop = startX; xLoop < endX; xLoop++ )
        {
            if( isWhite( &( sourceImage->GetPixel( xLoop, yLoop ) ) ) )
            {
                pixSetRGBPixel( outputImage, destinationX, destinationY, 0,0,0 );
            }
            else
            {
                pixSetRGBPixel( outputImage, destinationX, destinationY, 255,255,255 );
            }
            destinationX++;
        }
        destinationY++;
    }
}
bool isWhite( RGBApixel *image )
{
    if(
        //destination->SetPixel( x, y, source->GetPixel( xLoop, yLoop ) );
        ( image->Red   < 50 ) ||
        ( image->Blue  < 50 ) ||
        ( image->Green < 50 )
        )
    {
        return false;
    }
    else
    {
        return true;
    }
}

我不喜欢的一件事是我在函数外部声明像素大小的方式。看来，如果我尝试在函数内执行此操作，我会得到意想不到的结果......如果在内部分配内存，那么当我离开时，内存就会被破坏。

谷歌邮箱
当然不是我最优雅的作品，但为了简单起见，我也把它去掉了。我不知道为什么我要费心分享这个。我应该把它留给自己。
我叫什么名字？ Kage.Sabaku.No.Gaara

在我放你走之前，我应该提到我的 Windows 窗体应用程序和默认设置之间的细微差别。即我使用“多字节”字符集。项目属性......等等......给狗一根骨头，也许投票？

pps 我不想这么说，但我对 host.c 做了一处更改，如果你使用 64 位，你也可以做同样的事情。否则你就得靠自己了……但我的理由有点疯狂，你不必这么做

typedef unsigned int uinT32;
#if (_MSC_VER >= 1200)            //%%% vkr for VC 6.0
typedef _int64 inT64;
typedef unsigned _int64 uinT64;
#else
typedef long long int inT64;
typedef unsigned long long int uinT64;
#endif                           //%%% vkr for VC 6.0
typedef float FLOAT32;
typedef double FLOAT64;
typedef unsigned char BOOL8;

I figured it out, if you are using visual studios 2010 and are using windows forms / designer you can add it easily this way with no issues

add the following projects to your project ( i am warning you once, do not add the tesseract solution, or change any setting in the projects you add, unless you love to hate yourself )
ccmain
ccstruct
ccutil
classify
cube
cutil
dict
image
libtesseract
nutral_networks
textord
viewer
wordrec

you can add the others but you don’t really want all that built into your project do you? naaa, build those separately

go to your project properties and add libtesseract as a reference, you can now that it is visible as a project, this will make it so that your project builds fast without examining the millions of warnings within tesseract. [common properties]->[add reference]
right click your project in the solution explorer and click project dependencies, make sure it is dependant on libtesseract or even all of them, it just means they build before your project.
the tesseract 2010 visual studio projects contain a number of configuration settings aka release, release.dll, debug, debug.dll, it seems that the release.dll settings produce the right files. First, set the solution output to release.dll. Click your project properties. Then click configuration manager. If that is not available, do this, click the SOLUTION's properties in the solution tree and click configuration tab, you will see a list of projects and the associated configuration settings. You will notice your project is not set to release.dll even though the output is. If you took the second route you still need to click configuration manager. Then you can edit the settings, click new on your projects settings and call it release.dll...exactly the same as the rest of them and copy the settings from release. Do the same thing for Debug, so that you have a debug.dll name copied from debug settings. wheew...almost done
Don’t try to change tesseracts settings to match yours....that wont work ....and when the new release comes out you wont be able to just "throw it in" and go. Accept the fact that in this state your new modes are Release.dll and Debug.dll. don’t stress out...you can go back when its is finished and remove the projects from your solution.
Guess where the libraries and dll’s come out? in your project, you may or may not need to add the library directories. Some people say to dump all the headers into a single folder so they only need to add one folder to the includes but not me. I want to be able to delete the tesseract folder and reload it from the zips without extra work....and be fully ready to update in one move or restore it if I made a mess of the code. Its a bit of work and you can to it with code instead of the settings which is the way i do it, but you should include all the folders that contain header files within the 2010 tesseract project folder and leave them alone.
there is no need to add any files to your project. just these lines of code..... I have included some additional code that converts from one foreign data set to the tiff friendly version with no need to save / load file. aren’t I nice?
now you can fully debug in debug.dll and release.dll, once you have successfully built it into your project even once you can remove all the added projects and it will be peeerfect. no extra compiling or errors. fully debugable, all natural.
If I remember right, I could not get around the fact I had to copy the files in 2008/lib/ into my projects release folder….darn it.

In my projects “functions.h” I put

#pragma comment (lib, "liblept.lib" )
#define _USE_TESSERACT_
#ifdef _USE_TESSERACT_
#pragma comment (lib, "libtesseract.lib" )
#include <baseapi.h>
#endif
#include <allheaders.h>

in my main project I put this in a class as a member:

tesseract::TessBaseAPI *readSomeNombers;

and of course I included “functions.h” somewhere

then I put this in my classes constructor:

readSomeNombers = new tesseract::TessBaseAPI();
readSomeNombers ->Init(NULL, "eng" );
readSomeNombers ->SetVariable( "tessedit_char_whitelist", "0123456789,." );

then I created this class member function: and a class member to serve as an output, don’t hate, I don’t like returning variables. Not my style. The memory for the pix does not need to be destroyed when used inside a member function this way I believe and my test suggest this is a safe way to call these functions. But by all means, you can do whatever.

void Gaara::scanTheSpot()
{
    Pix *someNewPix;
    char* outText;
    ostringstream tempStream;
    RECT tempRect;
    someNewPix = pixCreate( 200 , 40 , 32 );
    convertEasyBmpToPix( &scanImage, someNewPix, 87, 42 );

    readSomeNombers ->SetImage(someNewPix);
    outText = readSomeNombers ->GetUTF8Text();
    tempStream.str("");
    tempStream << outText;
    classMemeberVariable = tempStream.str();
//pixWrite( "test.bmp", someNewPix, IFF_BMP );
}

The object that has the information that I want to scan is in memory and is pointed to by &scanImage. It is from the “EasyBMP” library but that is not important.

Which I deal with in a function in “functions.h”/ “functions.cpp”
by the way, i am doing a little extra processing here while i am in the loop, namely thinning the characters and making it black and white and reversing black and white which is unnecessary. At this phase in my development I am still looking for ways to improve the recognition. Though for my proposes this has not yielded bad data yet. My view is to use the default Tess data for simplicity. I am acting heuristically to solve a very complex problem.

void convertEasyBmpToPix( BMP *sourceImage, PIX *outputImage, unsigned startX, unsigned startY )
{
    int endX = startX + ( pixGetWidth( outputImage ) );
    int endY = startY + ( pixGetHeight( outputImage ) );
    unsigned destinationX;
    unsigned destinationY = 0;
    for( int yLoop = startY; yLoop < endY; yLoop++ )
    {
        destinationX = 0;
        for( int xLoop = startX; xLoop < endX; xLoop++ )
        {
            if( isWhite( &( sourceImage->GetPixel( xLoop, yLoop ) ) ) )
            {
                pixSetRGBPixel( outputImage, destinationX, destinationY, 0,0,0 );
            }
            else
            {
                pixSetRGBPixel( outputImage, destinationX, destinationY, 255,255,255 );
            }
            destinationX++;
        }
        destinationY++;
    }
}
bool isWhite( RGBApixel *image )
{
    if(
        //destination->SetPixel( x, y, source->GetPixel( xLoop, yLoop ) );
        ( image->Red   < 50 ) ||
        ( image->Blue  < 50 ) ||
        ( image->Green < 50 )
        )
    {
        return false;
    }
    else
    {
        return true;
    }
}

one thing I don't like is the way I declare the size of the pix outside the function. It seems if I try to do it within the function I have unexpected results....if the memory is allocated while inside it is destroyed when I leave.

g m a i l
Certainly not my most elegant work but I also gutted the hell out of it for simplicity. Why I bother to share this I don't know. I should have kept it to myself.
What is my name? Kage.Sabaku.No.Gaara

before i let you go i should mention the subtle differences between my windows form app and the default settings. namely i use "multi-byte" character set. project properties...and such..give a dog a bone, maybe a vote?

p.p.s. I hate to say it but I made one change to host.c if you use 64 bit you can do the same. Otherwise your on your own.....but my reason was a bit insane you don't have to

typedef unsigned int uinT32;
#if (_MSC_VER >= 1200)            //%%% vkr for VC 6.0
typedef _int64 inT64;
typedef unsigned _int64 uinT64;
#else
typedef long long int inT64;
typedef unsigned long long int uinT64;
#endif                           //%%% vkr for VC 6.0
typedef float FLOAT32;
typedef double FLOAT64;
typedef unsigned char BOOL8;

回复收藏 0 原文

温柔嚣张 2024-10-25 11:52:15

如果您使用 Windows 10，则有 OCR API。无需安装任何东西。

这些东西很难做好。该文档非常不容易使用。

但我做对了。

这是一个使用 Windows 10 OCR 引擎 API 的简单函数：


// For the Windows 10 OCR API
#include "winrt/Windows.Storage.Streams.h"
#include "winrt/Windows.Graphics.Imaging.h"
#include "winrt/Windows.Media.Ocr.h"
#include "winrt/Windows.Networking.Sockets.h"
#include "winrt/Windows.Globalization.h"
#pragma comment(lib, "pathcch")
#pragma comment(lib,"windowsapp.lib")

std::string ExtractTextFromImage(byte* pixels, int xSize, int ySize)
{
    using namespace winrt;

    Windows::Globalization::Language lang = Windows::Globalization::Language(L"en");
    Windows::Media::Ocr::OcrEngine engine = Windows::Media::Ocr::OcrEngine::TryCreateFromLanguage(lang);
    //OcrEngine engine = OcrEngine::TryCreateFromUserProfileLanguages();


    int pixels_size = xSize * ySize * 4;

    Windows::Storage::Streams::InMemoryRandomAccessStream stream = { 0 };
    Windows::Storage::Streams::DataWriter writer(stream);


    array_view<const byte> bytes(pixels, pixels + pixels_size);

    writer.WriteBytes(winrt::array_view<const byte>(bytes));

    Windows::Storage::Streams::IBuffer buffer = writer.DetachBuffer();



    Windows::Graphics::Imaging::SoftwareBitmap bitmap = Windows::Graphics::Imaging::SoftwareBitmap::CreateCopyFromBuffer
    (
        buffer,
        Windows::Graphics::Imaging::BitmapPixelFormat::Bgra8,
        xSize,
        ySize
    );

    Windows::Media::Ocr::OcrResult result = engine.RecognizeAsync(bitmap).get();
    std::string output = winrt::to_string(result.Text());

    bitmap.Close();
    writer.Close();



    return output;
}

If you using windows 10, there is OCR API. no need to install anything.

The stuff is very hard to get it right. the documentation was very not easy to work with.

But I got it right.

Here is a simple function that use Windows 10 OCR engine API:


// For the Windows 10 OCR API
#include "winrt/Windows.Storage.Streams.h"
#include "winrt/Windows.Graphics.Imaging.h"
#include "winrt/Windows.Media.Ocr.h"
#include "winrt/Windows.Networking.Sockets.h"
#include "winrt/Windows.Globalization.h"
#pragma comment(lib, "pathcch")
#pragma comment(lib,"windowsapp.lib")

std::string ExtractTextFromImage(byte* pixels, int xSize, int ySize)
{
    using namespace winrt;

    Windows::Globalization::Language lang = Windows::Globalization::Language(L"en");
    Windows::Media::Ocr::OcrEngine engine = Windows::Media::Ocr::OcrEngine::TryCreateFromLanguage(lang);
    //OcrEngine engine = OcrEngine::TryCreateFromUserProfileLanguages();


    int pixels_size = xSize * ySize * 4;

    Windows::Storage::Streams::InMemoryRandomAccessStream stream = { 0 };
    Windows::Storage::Streams::DataWriter writer(stream);


    array_view<const byte> bytes(pixels, pixels + pixels_size);

    writer.WriteBytes(winrt::array_view<const byte>(bytes));

    Windows::Storage::Streams::IBuffer buffer = writer.DetachBuffer();



    Windows::Graphics::Imaging::SoftwareBitmap bitmap = Windows::Graphics::Imaging::SoftwareBitmap::CreateCopyFromBuffer
    (
        buffer,
        Windows::Graphics::Imaging::BitmapPixelFormat::Bgra8,
        xSize,
        ySize
    );

    Windows::Media::Ocr::OcrResult result = engine.RecognizeAsync(bitmap).get();
    std::string output = winrt::to_string(result.Text());

    bitmap.Close();
    writer.Close();



    return output;
}

回复收藏 0 原文

~没有更多了~