有什么方法可以改善小字体的 tesseract OCR 吗？

发布于 2024-10-16 04:45:50 字数 447 浏览 12 评论 0原文

我正在尝试通过 python-tesseract 使用 tesseract-OCR 来读取如下所示的低分辨率字体：

在此处输入图像描述

不幸的是，图像返回，

ZIJZHZI

我认为分辨率太低，这导致了问题。我尝试过放大图像，并将其裁剪为单个字符，但这些都没有提供太大的改进。我还应该考虑做其他什么事情，最好是可以使用 Python 成像库完成的事情吗？或者我应该放弃/训练超正方体。

就其价值而言，PIL 具有以下内置过滤器：

模糊、轮廓、细节、边缘增强，
EDGE_ENHANCE_MORE、浮雕、FIND_EDGES、
平滑、平滑_更多和锐化

原文

I'm trying to use tesseract-OCR via python-tesseract to read a low resolution font that looks like this:

enter image description here

Unfortunately that image returns

ZIJZHZI

I think the resolution is too low and that is causing problems. I've tried magnifying the image, and cropping it down to individual characters, but neither of these provide much improvement. Is there anything else I should consider doing, preferably something that could be done using the Python Imaging Library? Or should I just give up/train tesseract.

For what it's worth, the PIL has the following built in filters: