有什么方法可以改善小字体的 tesseract OCR 吗?
我正在尝试通过 python-tesseract 使用 tesseract-OCR 来读取如下所示的低分辨率字体:
不幸的是,图像返回,
ZIJZHZI
我认为分辨率太低,这导致了问题。我尝试过放大图像,并将其裁剪为单个字符,但这些都没有提供太大的改进。我还应该考虑做其他什么事情,最好是可以使用 Python 成像库完成的事情吗?或者我应该放弃/训练超正方体。
就其价值而言,PIL 具有以下内置过滤器:
模糊、轮廓、细节、边缘增强,
EDGE_ENHANCE_MORE、浮雕、FIND_EDGES、
平滑、平滑_更多和锐化
I'm trying to use tesseract-OCR via python-tesseract to read a low resolution font that looks like this:
Unfortunately that image returns
ZIJZHZI
I think the resolution is too low and that is causing problems. I've tried magnifying the image, and cropping it down to individual characters, but neither of these provide much improvement. Is there anything else I should consider doing, preferably something that could be done using the Python Imaging Library? Or should I just give up/train tesseract.
For what it's worth, the PIL has the following built in filters:
BLUR, CONTOUR, DETAIL, EDGE_ENHANCE,
EDGE_ENHANCE_MORE, EMBOSS, FIND_EDGES,
SMOOTH, SMOOTH_MORE, and SHARPEN
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我尝试用以下方法放大图像:
然后读取它:
结果是正确的:
I've tried to magnify the image with:
And then read it:
The result is correct: