使用 Tesseract 检测水平和垂直文本

发布于 2025-01-16 08:15:03 字数 306 浏览 1 评论 0原文

我有一些带有一些符号、水平和垂直文本的图像，我正在尝试使用 Python 和 Tesseract OCR 检测所有文本。我做了一些预处理，结果显示在这个图像示例中，其中打印了超立方体的输出及其边界框、捕获的文本和置信度。

所见，该脚本做得相当不错，但仅限于水平文本。有没有一种简单的方法或任何 Tesseract 参数可以帮助我在同一图像中找到水平和垂直文本？到目前为止我设置的唯一参数是 psm = 11（稀疏文本）。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

意中人 2025-01-23 08:15:03

您的图像需要通过将绿色转换为白色来进行进一步的图像预处理。这些行会影响 Tesseract 中的页面分割。然后，您需要运行 Tesseract 两次。第一次运行它，将图像旋转 90 度，然后再次运行它。您不必担心旋转，因为您仍然可以在 Tesseract 中使用 image_to_data 从文本所在的位置移动到文本所在的位置。我向您建议这一点是因为 Tesseract 页面分割不能很好地处理多个方向的文本。如果您愿意，您可以查看 PSM。

Page segmentation modes:
  0    Orientation and script detection (OSD) only.
  1    Automatic page segmentation with OSD.
  2    Automatic page segmentation, but no OSD, or OCR. (not implemented)
  3    Fully automatic page segmentation, but no OSD. (Default)
  4    Assume a single column of text of variable sizes.
  5    Assume a single uniform block of vertically aligned text.
  6    Assume a single uniform block of text.
  7    Treat the image as a single text line.
  8    Treat the image as a single word.
  9    Treat the image as a single word in a circle.
 10    Treat the image as a single character.
 11    Sparse text. Find as much text as possible in no particular order.
 12    Sparse text with OSD.
 13    Raw line. Treat the image as a single text line,
       bypassing hacks that are Tesseract-specific.

Your image needs further image pre-processing by converting the green color to white. Having these lines would affect the page segmentation in Tesseract. Then, you will need to run Tesseract twice. Run it for the first time, rotate your image by 90 degrees, and run it again. You shouldn't worry about the rotation because you can still use image_to_data in Tesseract to get from where to where your text exists. I'm suggesting this to you because Tesseract page segmentation doesn't handle text in multiple directions well. If you wish, you may have a look on the PSMs.

Page segmentation modes:
  0    Orientation and script detection (OSD) only.
  1    Automatic page segmentation with OSD.
  2    Automatic page segmentation, but no OSD, or OCR. (not implemented)
  3    Fully automatic page segmentation, but no OSD. (Default)
  4    Assume a single column of text of variable sizes.
  5    Assume a single uniform block of vertically aligned text.
  6    Assume a single uniform block of text.
  7    Treat the image as a single text line.
  8    Treat the image as a single word.
  9    Treat the image as a single word in a circle.
 10    Treat the image as a single character.
 11    Sparse text. Find as much text as possible in no particular order.
 12    Sparse text with OSD.
 13    Raw line. Treat the image as a single text line,
       bypassing hacks that are Tesseract-specific.

回复收藏 0 原文

~没有更多了~