提高扫描文档的 OCR 准确性

发布于 2024-10-11 07:09:35 字数 310 浏览 4 评论 0原文

我使用标准 Brother A3 多功能一体机扫描大量 A3 文档,然后使用 FineReader Pro 对图像进行 OCR 处理。

但是,我在识别的字符中遇到很多错误,以及很多非字母数字的奇怪字符。

有人可以给我一些以编程方式提高 OCR 准确性的提示,无论是对扫描图像进行预处理,还是对识别文本进行后处理?


编辑:查找示例 pdf。它包括一些示例图像,我从中得到了最差的结果。

I'm scanning a lot of A3 documents using a standard Brother A3 Multifunction and then use FineReader Pro for OCR'ing the images.

However, I'm getting a lot of errors in the characters recognized, and lots of non-alphanumeric strange characters.

Can someone give me any tips for programmatically improving the OCR accuracy, either pre-processing on the scanned images, or post-processing on the recognized text?


Edit: Find a sample pdf. It includes some sample images from which I get the poorest results.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

千と千尋 2024-10-18 07:09:35

您是否有可以发布在某处的示例图像,然后我们可以快速告诉您导致大多数问题的原因。 FineReader 是目前最好的 OCR 引擎之一,因此您的结果不佳肯定是有原因的。

这可能与对比度和阈值设置不佳、图像倾斜、扫描仪中的脏滚筒、复杂且彩色的背景、抖动背景、字体大小太小、扫描 dpi 太低等有关......

看到所附图像后,有一个几个小问题。

  1. 背景页上有很多脏斑点。 FineReader 似乎在您的图像上对此做了合理的工作。
  2. 存在一些轻微的偏差,但这不会导致问题。
  3. FineReader 与用于列标题的粗体高 Arial 字体混淆。
    4 一个大问题似乎是页面底部区域的对比度较差且图像模糊。这似乎是扫描仪的问题,但也可能是由于打印问题造成的。

印刷质量很差,我猜这是报纸的扫描件。大多数错误都是由于扫描问题造成的,因此很难以编程方式改进结果。

首先,我会尝试使用稍高的分辨率扫描灰度图像,看看是否有帮助。 FineReader 可以很好地处理灰度图像。如果您必须有黑白图像,请查看扫描仪驱动程序是否包含动态阈值设置并将其打开。

对于任何 OCR 引擎来说,获取图像都不是一件容易的事。如果您可以改进扫描,您将获得更好的结果。第 3 页右下角有很多噪音。

您使用的是哪个版本的 FineReasder? FR10 可能会比以前的版本提供更好的结果。

Do you have a sample image you can post somewhere then we can quickly tell you what is causing most of your problems. FineReader is one of the better OCR engines out there so there are definitely reasons why you are getting poor results.

It could be related to poor contrast and threshold settings, image skewing, dirty rollers in the scanner, complex and coloured backgrounds, dithered backgrounds, font sizes too small, scanning dpi being too low etc...

After seeing the attached image there are a few small issues.

  1. There are lots of dirty specks on the background page. FineReader seems to do a reasonable job with this on your images.
  2. There is some slight skew but that is not causing and problems.
  3. FineReader is getting confused with BOLD tall Arial type font used for column headers.
    4 A big problem seems to be the bottom region of the pages where the contrast is poor and the image is fuzzy. This seems to be a problem with the scanner but could be due to printing problems.

The printing is quite poor and I am guessing it is a scan from a newspaper. Most of your errors are due to scanning issues so it would be hard to programmatically improve the results.

Firstly, I would try scanning the image in grayscale using a slightly higher resolution and see if that helps. FineReader works well with grayscale images. If you have to have a B/W image then see if the scanner driver includes a setting for dynamic thresholding and turn it on.

Your images would not be an easy task for any OCR engine. You will get better results if you can improve the scanning. Page 3 has a lot of noise in the bottom right corner.

What version of FineReasder are you using ? FR10 would probably give better results than previous versions.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文