如何在 pytesseract (python) 中使用 tessdata_fast?
我目前正在尝试在 macOS 上的 python 中使用 Tesseract OCR 引擎来检测文本的方向(使用 image_to_osd)。
目前检测方向需要很长时间(300ms),所以我的目标是减少这个时间。我正在尝试使用 tessdata_fast 的数据集,因为我相信这将有助于减少时间,而且我不太关心准确性。
我已使用此链接:https://github.com/tesseract-ocr/tessdata_fast下载 tessdata_fast 文件夹中的 eng.traineddata 和 osd.traineddata 并将其添加到 tesseract 文件夹中。我尝试将配置自定义为 custom_config = r'--oem 1 --tessdata-dir /usr/local/Cellar/tesseract/5.0.1/share/tessdata_fast --psm 0'
。然而,所花费的时间似乎并没有减少,所以我不确定我的配置是否正在运行 tessdata_fast 还是之前下载的 tessdata。
我检查了命令 tesseract --list-langs
,它似乎正在读取 tessdata :
"/usr/local/share/tessdata/" (2):
eng
osd
我尝试删除以前下载的 tessdata 并再次运行该命令,但结果是 "/ usr/local/share/tessdata/" (0):
有谁知道我哪里出错了?或者我应该采取哪些步骤来使用 tessdata_fast 运行 pytesseract?
谢谢你!
I am currently trying to use the Tesseract OCR engine in python on macOS to detect the orientation of text (using image_to_osd).
It currently takes a long time to detect the orientation (300ms), so my aim is to decrease this time. I am trying to use the data set of tessdata_fast, as I believe this would help reduce the time and I am not too concerned about accuracy.
I have used this link: https://github.com/tesseract-ocr/tessdata_fast to download the eng.traineddata and the osd.traineddata in a tessdata_fast folder and added it to the tesseract folder. I have tried to customise the configuration as custom_config = r'--oem 1 --tessdata-dir /usr/local/Cellar/tesseract/5.0.1/share/tessdata_fast --psm 0'
. However, the time taken does not seem to decrease, so I am unsure if my configuration is running tessdata_fast or the tessdata previously downloaded.
I have checked the command tesseract --list-langs
and it seemed to be reading the tessdata :
"/usr/local/share/tessdata/" (2):
eng
osd
I have tried to delete the previously downloaded tessdata and run the command again but the result is "/usr/local/share/tessdata/" (0):
Does anyone know where I am going wrong? Or what steps should I be taking to run pytesseract with tessdata_fast?
Thank you!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
根据pytesseract的文档,tesseract有参数
--tessdata-dir
并指定数据的路径。然后,将其添加到 pytesseract 的配置中,如下所示:更多详细信息请参见 https://pypi.org/项目/pytesseract/。
According to the documentation of pytesseract, there is the argument
--tessdata-dir
of tesseract and specify the path of your data. Then, add it to the config of pytesseract, as follows:For more details see https://pypi.org/project/pytesseract/.