A basic approach is to make a histogram of black pixels. First: project all pixels on a line. The deep valleys in the histgram indicate separation between lines (try different angles if the paper might be tilted). Then, per line (or per page if you know the font is monospaced) project the pixels on a horizontal histogram. This will give you a strong indication of inter character spaces. As a minimum this gives you a value for the average character height and width that will help you in next steps.
After that, you need to take care of kerning (where characters overlap). Find the connected pixels, possibly by first doing dilatation or erosion on the image to compensate for scanning artifacts.
Depending on the quality of the scan image you may have to use more advanced techniques, but this will get you going.
Also as mentioned above tesseract is a good OCR open-source python library (the one that i personally use as well). Other approaches that you may take is through sklearn
发布评论
评论(3)
基本方法是制作黑色像素的直方图。第一:将所有像素投影在一条线上。直方图中的深谷表示线条之间的分离(如果纸张可能倾斜,请尝试不同的角度)。然后,每行(或每页,如果您知道字体是等宽字体)将像素投影到水平直方图上。这将为您提供字符间空间的强烈指示。至少,这会为您提供平均字符高度和宽度的值,这将有助于您执行后续步骤。
之后,您需要处理字距调整(字符重叠的地方)。找到连接的像素,可能首先对图像进行膨胀或腐蚀以补偿扫描伪影。
根据扫描图像的质量,您可能需要使用更先进的技术,但这会让您继续前进。
A basic approach is to make a histogram of black pixels. First: project all pixels on a line. The deep valleys in the histgram indicate separation between lines (try different angles if the paper might be tilted). Then, per line (or per page if you know the font is monospaced) project the pixels on a horizontal histogram. This will give you a strong indication of inter character spaces. As a minimum this gives you a value for the average character height and width that will help you in next steps.
After that, you need to take care of kerning (where characters overlap). Find the connected pixels, possibly by first doing dilatation or erosion on the image to compensate for scanning artifacts.
Depending on the quality of the scan image you may have to use more advanced techniques, but this will get you going.
这听起来不像人工智能,听起来像是您在谈论 OCR:
http://en。 wikipedia.org/wiki/Optical_character_recognition
请参阅 google tesseract
http://code.google.com/p/tesseract-ocr/
编辑未经编辑的问题是关于人工智能的。
This doesn't sound like artificial intelligence, it sounds like you're talking about OCR:
http://en.wikipedia.org/wiki/Optical_character_recognition
See google tesseract
http://code.google.com/p/tesseract-ocr/
EDIT The unedited question was asking about artificial intelligence.
对我来说,这个问题本身似乎并不明确。
当它谈论 OCR 时,将在这里留下几篇文章,它们可能会有所帮助(它们至少对我有帮助):
如上所述 tesseract 是一个很好的 OCR 开源 python 库(我个人也使用的那个)。您可以采取的其他方法是通过 sklearn
您可能还需要检查 这篇 stackoverflow 帖子。
我也很确定您可以使用 researchgate 来检查那里的任何论文(我发现了一些,只是不确定这是否是您所需要的)
我认为上述通用答案适合通用问题。
To me the question per se does not seem clear.
As it talks about OCR will leave a couple of articles here that they may help (they help me at least):
Also as mentioned above tesseract is a good OCR open-source python library (the one that i personally use as well). Other approaches that you may take is through sklearn
You may also want to check this stackoverflow post.
I am also pretty sure that you can use researchgate to check for any papers out there (I found some, just not sure if this is what you need)
I think that the above generic answer suits the generic question.