Linux 下的 Python OCR 模块?
我想在linux中找到一个易于使用的OCR python模块,我找到了pytesser http:// code.google.com/p/pytesser/,但它包含一个 .exe 可执行文件。
我尝试更改代码以使用 wine,它确实有效,但它太慢了,而且确实不是一个好主意。
有没有像它一样易于使用的 Linux 替代品?
I want to find a easy-to-use OCR python module in linux, I have found pytesser http://code.google.com/p/pytesser/, but it contains a .exe executable file.
I tried changed the code to use wine, and it really works, but it's too slow and really not a good idea.
Is there any Linux alternatives that as easy-to-use as it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您可以将
tesseract
包装在函数中:如果您需要文档分段和更高级的功能,请尝试 OCropus。
You can just wrap
tesseract
in a function:If you want document segmentation and more advanced features, try out OCRopus.
除了 Blender 的答案(仅执行 Tesseract 可执行文件)之外,我想补充一点,OCR 还存在其他替代方案,也可以称为外部进程。
ABBYY 命令行 OCR 实用程序: http://ocr4linux.com/en:start
它不是免费的,因此仅当 Tesseract 精度不足以满足您的任务,或者您需要更复杂的布局分析,或者需要导出 PDF、Word 和其他文件时才值得考虑。
更新:以下是 ABBYY 和 tesseract 准确性的比较:http://www. splitbrain.org/blog/2010-06/15-linux_ocr_software_comparison
免责声明:我工作适用于泰比
In addition to Blender's answer, that just executs Tesseract executable, I would like to add that there exist other alternatives for OCR that can also be called as external process.
ABBYY comand line OCR utility: http://ocr4linux.com/en:start
It is not free, so worth to consider only if Tesseract accuracy is not good enough for your task, or you need more sophisticated layout analisys or you need to export PDF, Word and other files.
Update: here's comparison of ABBYY and tesseract accuracy: http://www.splitbrain.org/blog/2010-06/15-linux_ocr_software_comparison
Disclaimer: I work for ABBYY
python tesseract
http://code.google.com/p/python-tesseract
python tesseract
http://code.google.com/p/python-tesseract
您应该尝试使用优秀的 scikits.learn 机器学习库。您可以在此处和此处。
You should try the excellent scikits.learn libraries for machine learning. You can find two codes that are ready to run here and here.
你在这里有很多选择。
正如其他人指出的,一种方法是使用超正方体。看起来现在有一堆包装器,所以最好的方法是 对它进行快速 pypi 搜索。目前最常用的是:
另一个查找类似引擎的有用网站是 alternative.to。根据他们的说法,一些基于 Linux 的系统是:
You have a bunch of options here.
One way, as others pointed out is to use tesseract. Looks like there are a bunch of wrappers by now, so best way is to do a quick pypi search for it. The most used ones these days are:
Another useful site for finding similar engines is alternative.to. A few linux based systems according to them are: