扫描文档中背景/前景图层的分离
我需要自动删除扫描文档图像的浅色背景以进行 OCR。
ScanTailor 是一个基于 C++ GUI 的开源应用程序,可实现背景分离等功能,但我不知道如何实现仅运行最后一步,实际删除背景。
理想情况下,我可以找到执行此操作的代码,或者:
- 将该部分移植到 C#
- 修改 C++ 以响应命令行执行,仅在给定图像上执行该步骤
您能帮助我了解如何执行任一操作吗?
或者你知道其他库可以做到这一点吗? (可接受任何语言/平台)
I need to automatically remove the mildly colored background of a scanned document image for OCR.
ScanTailor is an open source C++ GUI-based app that does background separation among other things, but I cannot figure out how to run only the last step which actually removes the background.
Ideally, I could find the code that does this and either:
- Port that part to C#
- Modify the C++ to respond to command line execution, only performing that step on a given image
Can you help me understand how I can do either?
or do you know other libraries that can do this? (any language/platform acceptable)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您指的是 OCR 应用中必需的阈值处理、去斑和噪声消除技术。
结果的质量很大程度上取决于许多不同的因素 -
原件的打印质量
扫描质量
图像分辨率
使用的背景颜色和图案。
噪音和其他痕迹。
您可以在 http://www.hi-components.com/nievolution 找到 IEvolution.NET 库.asp很有用。它有许多图像处理功能可以使用。
有许多可用的商用发动机。没有一种完美的函数可以解决图像处理问题。您必须调整函数和参数以匹配您的图像。 http://www.recogniform.com/thresholding.htm
Google 搜索将显示大量结果。
You are referring to Thresholding, Despeckling and Noise Removal techniques which are necessary in OCR applications.
The quality of the results depends very much an many different factors -
Print quality of the original
Scan quality
Image resolution
Background colours and patterns used.
Noise and other marks.
You may find the IEvolution.NET library at http://www.hi-components.com/nievolution.asp useful. It has many image processing functions to play with.
There are many commercial engines available. There is no one perfect function to solve image processing problems. You must adapt the functions and parameter to match your images. http://www.recogniform.com/thresholding.htm
A Google search will show up lots of results.
也许该算法大约是:
如果它是高分辨率低颜色深度(例如黑白)图像,那么您需要应用此算法到像素组。
Maybe the algorithm is, approximately:
If it's a high-resolution low-color-depth (e.g. black-and-white) image, then you need to apply this algorithm to groups of pixels.