图像处理 - 旋转扫描文档以对齐文本
我有一个 OCR C# 项目,其中包含包含文本的扫描文档,我需要返回文档中的文本。
我已经有了解析文本的解决方案,但是我们陷入了扫描文档旋转的部分(向右或向左)。
假设图像中没有噪声(所有像素都是白色或黑色),任何人都可以帮助我们使用一种算法在运行时旋转图像(没有人眼)吗?
谢谢
I have an OCR C# project where I get a scanned document with text in it, and I need to return the text in the document.
I already have the solution for parsing the text, however we are stuck in the part where the scanned document is rotated (to the right or to the left).
Suppose there is no noise in the image (All pixels are white or black), can anyone help us with an algorithm to rotate the image in runtime (Without a human eye)?
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
使用霍夫变换来检测最强的线条方向,即水平文本方向。霍夫变换的基本前提是将 xy 坐标转换为 r-theta 坐标系,其中 r 是距原点的距离,theta 是方向。
图像转换后,将相同的 theta 进行分组以找到最强的方向。
因为该方法使用离散 r 和 theta 内的投票。 Theta 的分辨率仅与所使用的 bin 数量一样好。因此,您可能希望将其限制为更准确的角度或速度,而不是以 1 度为增量使用 -180 到 +180 度。
Use Hough Transform to detect the strongest line orientation which should be the horizontal text orientation. The basic premise of the Hough Transform is to convert x-y coordinate to a r-theta coordinate system where r is the distance from origin and theta is the orientation.
Once the image is transformed, bin same thetas to find the strongest orientation.
Because this method uses voting within discrete r and thetas. The resolution of the theta is only as good as number of bins used. So instead of using -180 to +180 degree in one degree increment, you might want to bound it for either more accurate angle or speed.
(我不是专家,但出于好奇写了这篇文章)
恕我直言,这个问题可以通过强力试错方法来有效地解决。因为不可能有太多错误的导向。
我认为您可以轻松确定文本的边界框。该边界框只能以两种方式具有错误的方向。顺时针旋转或逆时针旋转。因此,通过最多两次图像旋转(使边界框直立的旋转),您可以找到正确的方向。
也就是说,您可以找到正确的文档方向,而无需进一步处理图像来确定文本对齐。我认为确定文本对齐将是相当大的处理。
更新
我建议我们不必找到精确的旋转角度。如果接线盒是直立的,则可以是直角或 180 度旋转角。
1) 将接线盒竖直
2)运行OCR,检查结果,如果正常则完成
3) 旋转180度
2)运行OCR。这次一定是在正确的角度
如果我们真的要找到精确的旋转角度,我认为必须从找到字符“o”、“c”或“m”(不包括斜体)的可能形状开始。或者,找到句点('.')的相对位置。我认为这需要复杂的操作。
(I not an expert but by curiosity write this post)
IMHO, this problem can be solved cost effectively with brute force trial and error approach. Because there can be not too many wrong orientation.
I think your can easily determine the bounding box of text. This bounding box can have wrong orientation only in two way. Rotated clock wisely or Rotated counter clock wisely. So with maximum two rotation of image (rotation that make bounding box upright) you can find correct orientation.
That is, you could find correct document orientation without further processing of image to determine text align. And determining the text align will be rather large processing I think.
UPDATE
I'm suggesting that we don't have to find exact rotation angle. If the bonding box is upright it can be in the right angle or 180 degree rotated angle.
1) make bonding box upright
2) run OCR, check the result, if ok its done
3) rotate 180 degree
2) run OCR. this time it must be in the right angle
If we really have to find the exact rotation angle, I think it must start with finding possible shape of character 'o', 'c', or 'm' (excluding italic font). Or, find relative location of the period('.'). This will require complicated operation, I think.