如何在 pytesseract (python) 中使用 tessdata_fast?

发布于 2025-01-12 03:33:15 字数 895 浏览 0 评论 0原文

我目前正在尝试在 macOS 上的 python 中使用 Tesseract OCR 引擎来检测文本的方向(使用 image_to_osd)。

目前检测方向需要很长时间(300ms),所以我的目标是减少这个时间。我正在尝试使用 tessdata_fast 的数据集,因为我相信这将有助于减少时间,而且我不太关心准确性。

我已使用此链接:https://github.com/tesseract-ocr/tessdata_fast下载 tessdata_fast 文件夹中的 eng.traineddata 和 osd.traineddata 并将其添加到 tesseract 文件夹中。我尝试将配置自定义为 custom_config = r'--oem 1 --tessdata-dir /usr/local/Cellar/tesseract/5.0.1/share/tessdata_fast --psm 0'。然而,所花费的时间似乎并没有减少,所以我不确定我的配置是否正在运行 tessdata_fast 还是之前下载的 tessdata。

我检查了命令 tesseract --list-langs ,它似乎正在读取 tessdata :

"/usr/local/share/tessdata/" (2):
eng
osd

我尝试删除以前下载的 tessdata 并再次运行该命令,但结果是 "/ usr/local/share/tessdata/" (0):

有谁知道我哪里出错了?或者我应该采取哪些步骤来使用 tessdata_fast 运行 pytesseract?

谢谢你!

I am currently trying to use the Tesseract OCR engine in python on macOS to detect the orientation of text (using image_to_osd).

It currently takes a long time to detect the orientation (300ms), so my aim is to decrease this time. I am trying to use the data set of tessdata_fast, as I believe this would help reduce the time and I am not too concerned about accuracy.

I have used this link: https://github.com/tesseract-ocr/tessdata_fast to download the eng.traineddata and the osd.traineddata in a tessdata_fast folder and added it to the tesseract folder. I have tried to customise the configuration as custom_config = r'--oem 1 --tessdata-dir /usr/local/Cellar/tesseract/5.0.1/share/tessdata_fast --psm 0'. However, the time taken does not seem to decrease, so I am unsure if my configuration is running tessdata_fast or the tessdata previously downloaded.

I have checked the command tesseract --list-langs and it seemed to be reading the tessdata :

"/usr/local/share/tessdata/" (2):
eng
osd

I have tried to delete the previously downloaded tessdata and run the command again but the result is "/usr/local/share/tessdata/" (0):

Does anyone know where I am going wrong? Or what steps should I be taking to run pytesseract with tessdata_fast?

Thank you!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

美煞众生 2025-01-19 03:33:15

根据pytesseract的文档,tesseract有参数--tessdata-dir并指定数据的路径。然后,将其添加到 pytesseract 的配置中,如下所示:

# Example config: r'--tessdata-dir "C:\Program Files (x86)\Tesseract-OCR\tessdata"'
# It's important to add double quotes around the dir path.
tessdata_dir_config = r'--tessdata-dir "<replace_with_your_tessdata_dir_path>"'
pytesseract.image_to_string(image, lang='chi_sim', config=tessdata_dir_config)

更多详细信息请参见 https://pypi.org/项目/pytesseract/

According to the documentation of pytesseract, there is the argument --tessdata-dir of tesseract and specify the path of your data. Then, add it to the config of pytesseract, as follows:

# Example config: r'--tessdata-dir "C:\Program Files (x86)\Tesseract-OCR\tessdata"'
# It's important to add double quotes around the dir path.
tessdata_dir_config = r'--tessdata-dir "<replace_with_your_tessdata_dir_path>"'
pytesseract.image_to_string(image, lang='chi_sim', config=tessdata_dir_config)

For more details see https://pypi.org/project/pytesseract/.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文