使用 Tesseract 界面进行 OCR
如何在 C# 中使用 Tesseract 的界面 OCR tiff 文件?
目前我只知道如何使用可执行文件来做到这一点。
How do you OCR an tiff file using Tesseract's interface in c#?
Currently I only know how to do it using the executable.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
免责声明:我在 Atalasoft 工作,
我们的 OCR 模块支持 Tesseract,如果事实证明不支持足够好了,您可以升级到更好的引擎,只需更改一行代码(我们为多个 OCR 引擎提供通用接口)。
Disclaimer: I work for Atalasoft
Our OCR module supports Tesseract and if that proves to not be good enough, you can upgrade to a better engine and just change one line of code (we provide a common interface to multiple OCR engines).
我今天发现 EMGU 现在包含一个 Tesseract 包装器。 虽然 opencv 库的非托管 dll 的数量可能看起来有点令人畏惧,但这并不是快速复制到输出目录无法解决的问题。 从那里开始,实际的 OCR 过程就像三行一样简单:
“robomatics”放在一起一个非常好的 YouTube 视频 演示了一个简单但有效的解决方案。
I discovered today that EMGU now includes a Tesseract wrapper. While the number of unmanaged dlls of the opencv lib might seem a little daunting, it's nothing that a quick copy to your output directory won't cure. From there the actual OCR process is as simple as three lines:
"robomatics" put together a very nice youtube video that demonstrates a simple but effective solution.
C# 程序启动 tesseract.exe,然后读取 tesseract.exe 的输出文件。
C# program launches tesseract.exe and then reads the output file of tesseract.exe.
源代码似乎适合可执行文件,您可能需要稍微重新连接一些东西,以便将其构建为 DLL。 我对 Visual C++ 没有太多经验,但我认为通过一些研究应该不会太难。 我的猜测是有人可能已经制作了一个库版本,你应该尝试谷歌。
一旦 DLL 文件中包含了 tesseract-ocr 代码,您就可以通过 Visual Studio 将该文件导入到 C# 项目中,并让它创建包装类并为您完成所有封送工作。 如果您无法导入,则 DllImport 将允许您调用 DLL 中的函数来自 C# 代码。
然后,您可以查看原始可执行文件,找到有关调用哪些函数来正确 OCR tiff 图像的线索。
The source code seemed to be geared for an executable, you might need to rewire stuffs a bit so it would build as a DLL instead. I don't have much experience with Visual C++ but I think it shouldn't be too hard with some research. My guess is that someone might have had made a library version already, you should try Google.
Once you have tesseract-ocr code in a DLL file, you can then import the file into your C# project via Visual Studio and have it create wrapper classes and do all the marshaling stuffs for you. If you can't import then DllImport will let you call the functions in the DLL from C# code.
Then you can take a look at the original executable to find clues on what functions to call to properly OCR a tiff image.
看看 tessnet (nuget 包 https://www.nuget.org/packages/TesserNet/ https://www.nuget.org/packages/NuGet.Tessnet2 )
Take a look at tessnet (nuget packages https://www.nuget.org/packages/TesserNet/ https://www.nuget.org/packages/NuGet.Tessnet2 )