绘图数字化 - 从图形图像中抓取样本值
这并不是真正的“OCR”,因为它不识别字符,但它与应用于曲线的想法相同。有人知道用于从(光栅)绘图图像检索值的图像处理库或已建立的算法吗?例如,在这张图中,我很难用眼睛读取准确的值,因为网格线之间存在这样的间隙:
我可以使用直尺或其他工具,但它仍然容易出错。如果有软件可以截取任何旧图表的屏幕截图并自动将其转换为可以查询的值表或函数,那就太好了。
好像叫“曲线识别”?还可用于从未发布基础数据的科学论文中的曲线中提取数据。
有一些人类指导也是可以的。例如,OCR 没有理由无法读取“100”并将其与线条匹配,但在机器提取曲线相对于网格线的路径后,让人类给出线条数值是可以的。我最感兴趣的是跟踪相对于网格的曲线的功能,即使网格倾斜、旋转或扭曲非仿射方式。
更新:
现在有一篇维基百科文章,名为将扫描的图表转换为数据 链接中有一堆软件。还有一些alternativeto.net 上的软件。我想该理论现在属于 http://dsp.stackexchange.com,而软件解决方案属于 http://superuser.com?
This isn't really "OCR", since it's not recognizing characters, but it's the same idea applied to curves. Anyone know of an image-processing library or established algorithm for retrieving the values from a (raster) plot image? For instance, in this graph, it's hard for me to read exact values with my eyes because there's such gaps between gridlines:
I can use a straight edge or whatever, but it's still going to be error-prone. It would be great if there were software that could just take a screenshot of any old graph and automatically convert it into a table of values or a function that could be queried.
Seems to be called "curve recognition"? Could also be used for extracting data from the curves in scientific papers for which the underlying data is not published.
And it's ok to have some human guidance. There's no reason an OCR couldn't read the "100" and match it up with the line, for instance, but it's ok to have a human give the lines numerical values after the machine has extracted the curve's path relative to the gridlines. I'm mostly interested in the function of tracing the curve relative to the grid, even if the grid is tilted, rotated, or warped in a non-affine way.
Update:
There is now a Wikipedia article called Converting scanned graphs to data with a bunch of software in the links. Also some software on alternativeto.net. I guess the theory belongs on http://dsp.stackexchange.com now, while the software solutions belong on http://superuser.com?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
这是极其困难且容易出错的。 (我们在化学领域经常做这种事情,试图分析化学。)这很大程度上取决于各种参数和条件。
很抱歉我是悲观的。如果您确实想要这些信息,那么可以通过大量投资或与从事此类工作的团体合作来完成。
This is extremely hard and error-prone. (We do this sort of thing a lot in chemistry where we try to analyze chemistry.) It depends critically on various parameters and conditions.
I'm sorry to be pessimistic. If you really want the info then it can be done with a lot of investment or collaboration with groups which do this sort of thing.
谷歌搜索“曲线识别软件”建议http://www.curveunscan.com/
google for "curve recognition software" suggests http://www.curveunscan.com/
http://www.digitizeit.de/ 是一个用于数字化图形的程序。
http://www.digitizeit.de/ is a program for digitizing graphs.
还有相关的 potrace ,该页面又提到了其他替代方案
There is also potrace which is related, and that page in turn mentions other alternatives
我不知道有什么软件可以满足您的要求,但如果您只能得到几个点,您可以使用某种回归来找到适合这些点的最佳函数。这个特殊的图表看起来像一个指数函数。所以你需要找到一个指数回归计算器。
I don't know of any software that does what you're asking, but if you can get just a few points you can use some kind of regression to find the best function that fits those points. This particular graph looks like an exponential function. So you'd want to find an exponential regression calculator.
我使用 im2graph 将图形图像转换为数据,即数字。 im2graph 是免费的,可用于 Linux 和 Windows。非常顺利,只需您很少的努力即可产生结果。
请参阅http://www.im2graph.co.il
I use im2graph to convert graph images to data, that is, numbers. im2graph is free and available for Linux and Windows. Very smooth and requires very little effort on your part to generate results.
See http://www.im2graph.co.il
用肉眼去刮取数值是非常困难的。但是您可以使用图形数字化仪来对离网点进行采样。互联网上有很多这样的工具。有人已经提到了 Digitizeit。然而,它不是免费的。
以下是我经常用来从图表和扫描文档中提取数据点的首选工具。
It is very difficult to scrape the values with naked eyes. But you can use graph digitizers that can allow you to sample off-grid points. There are many such tools on the internet. Someone has already mentioned Digitizeit. However, it is not free.
Here are my preferred tools that I often use to extract data points from graphs and scanned documents.