绘图数字化 - 从图形图像中抓取样本值

发布于 2024-08-09 19:50:32 字数 941 浏览 8 评论 0原文

这并不是真正的“OCR”,因为它不识别字符,但它与应用于曲线的想法相同。有人知道用于从(光栅)绘图图像检索值的图像处理库或已建立的算法吗?例如,在这张图中,我很难用眼睛读取准确的值,因为网格线之间存在这样的间隙:

替代文本

我可以使用直尺或其他工具,但它仍然容易出错。如果有软件可以截取任何旧图表的屏幕截图并自动将其转换为可以查询的值表或函数,那就太好了。

好像叫“曲线识别”?还可用于从未发布基础数据的科学论文中的曲线中提取数据。

有一些人类指导也是可以的。例如,OCR 没有理由无法读取“100”并将其与线条匹配,但在机器提取曲线相对于网格线的路径后,让人类给出线条数值是可以的。我最感兴趣的是跟踪相对于网格的曲线的功能,即使网格倾斜、旋转或扭曲非仿射方式。

更新:

现在有一篇维基百科文章,名为将扫描的图表转换为数据 链接中有一堆软件。还有一些alternativeto.net 上的软件。我想该理论现在属于 http://dsp.stackexchange.com,而软件解决方案属于 http://superuser.com?

This isn't really "OCR", since it's not recognizing characters, but it's the same idea applied to curves. Anyone know of an image-processing library or established algorithm for retrieving the values from a (raster) plot image? For instance, in this graph, it's hard for me to read exact values with my eyes because there's such gaps between gridlines:

alt text

I can use a straight edge or whatever, but it's still going to be error-prone. It would be great if there were software that could just take a screenshot of any old graph and automatically convert it into a table of values or a function that could be queried.

Seems to be called "curve recognition"? Could also be used for extracting data from the curves in scientific papers for which the underlying data is not published.

And it's ok to have some human guidance. There's no reason an OCR couldn't read the "100" and match it up with the line, for instance, but it's ok to have a human give the lines numerical values after the machine has extracted the curve's path relative to the gridlines. I'm mostly interested in the function of tracing the curve relative to the grid, even if the grid is tilted, rotated, or warped in a non-affine way.

Update:

There is now a Wikipedia article called Converting scanned graphs to data with a bunch of software in the links. Also some software on alternativeto.net. I guess the theory belongs on http://dsp.stackexchange.com now, while the software solutions belong on http://superuser.com?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

梦回梦里 2024-08-16 19:50:32

这是极其困难且容易出错的。 (我们在化学领域经常做这种事情,试图分析化学。)这很大程度上取决于各种参数和条件。

  1. 图像是位图(仅像素)还是矢量(EMF、WMF、SVG、PS、PDF...)?矢量比像素好得多。我们处理矢量(包括 PDF),但不触及像素。我们的一些合作者会尝试使用像素,但仅限于最近的文档。
  2. 如果您受像素困扰,那么您的图像都来自同一来源吗?如果是这样,您提取字体信息的机会很小。恐怕你的形象太差了,需要做大量的工作。但是,如果您可以计算出字体,并且所有文档都来自同一来源,那么您就有机会提取文本和数字。您可以使用启发式(例如数字可能在哪里的规则)或机器学习(可以训练方法的功能列表)。
  3. 您的图像似乎已被扫描(因为轴已像素化)。这使得情况变得更糟。对于机器来说,肉眼看来是一条直线是可怕的。您的图像在页面上是否倾斜?您可能必须对其进行校正。
  4. 如果您有直线和曲线的模型,那么您可能需要更改将预期参数建模到图像中的方法。但这并非小事。

很抱歉我是悲观的。如果您确实想要这些信息,那么可以通过大量投资或与从事此类工作的团体合作来完成。

This is extremely hard and error-prone. (We do this sort of thing a lot in chemistry where we try to analyze chemistry.) It depends critically on various parameters and conditions.

  1. Is the image a bit-map (pixels-only) or vectors (EMF, WMF, SVG, PS, PDF...)? Vectors are vastly better than pixels. We tackle vectors (including PDF) but don't touch pixels. Some of our collbaorators will try to use pixels but only on fairly recent documents.
  2. If you are stuck with pixels then are your images all from the same source? If so you have a small chance of extracting font information. I am afraid your image is so poor that it would require a great deal of work. However if you can work out the font you have a chance of extracting text and numbers if all docs are from the same source. You could use heuristics (rules such as where the numbers might be) or machine-learning (a list of features on whioch the methods can be trained).
  3. Your image appears to have been scanned (as the axes are pixelated). That makes it even worse. What appears a straight line to the eye is horrible for a machine. Is your image skewed on the page? You may have to deskew it.
  4. If you have a model for the lines and curves then you may have a change of modelling expected parameters into the image. But it's not trivial.

I'm sorry to be pessimistic. If you really want the info then it can be done with a lot of investment or collaboration with groups which do this sort of thing.

鱼窥荷 2024-08-16 19:50:32

谷歌搜索“曲线识别软件”建议http://www.curveunscan.com/

google for "curve recognition software" suggests http://www.curveunscan.com/

青朷 2024-08-16 19:50:32

http://www.digitizeit.de/ 是一个用于数字化图形的程序。

http://www.digitizeit.de/ is a program for digitizing graphs.

巡山小妖精 2024-08-16 19:50:32

还有相关的 potrace ,该页面又提到了其他替代方案

There is also potrace which is related, and that page in turn mentions other alternatives

白日梦 2024-08-16 19:50:32

我不知道有什么软件可以满足您的要求,但如果您只能得到几个点,您可以使用某种回归来找到适合这些点的最佳函数。这个特殊的图表看起来像一个指数函数。所以你需要找到一个指数回归计算器。

I don't know of any software that does what you're asking, but if you can get just a few points you can use some kind of regression to find the best function that fits those points. This particular graph looks like an exponential function. So you'd want to find an exponential regression calculator.

小兔几 2024-08-16 19:50:32

我使用 im2graph 将图形图像转换为数据,即数字。 im2graph 是免费的,可用于 Linux 和 Windows。非常顺利,只需您很少的努力即可产生结果。
请参阅http://www.im2graph.co.il

I use im2graph to convert graph images to data, that is, numbers. im2graph is free and available for Linux and Windows. Very smooth and requires very little effort on your part to generate results.
See http://www.im2graph.co.il

旧竹 2024-08-16 19:50:32

用肉眼去刮取数值是非常困难的。但是您可以使用图形数字化仪来对离网点进行采样。互联网上有很多这样的工具。有人已经提到了 Digitizeit。然而,它不是免费的。

以下是我经常用来从图表和扫描文档中提取数据点的首选工具。

  1. PlotDigitizer.com:它是免费(在线)和付费(离线)的,支持许多图表。它还支持对数刻度,就像图表中的那样。
  2. WebPlotDigitizer:它也是一个非常流行的工具,并且完全免费。但有时,我发现它有缺陷和故障。
  3. Digitizeit:它是一个付费工具,没有在线版本。

It is very difficult to scrape the values with naked eyes. But you can use graph digitizers that can allow you to sample off-grid points. There are many such tools on the internet. Someone has already mentioned Digitizeit. However, it is not free.

Here are my preferred tools that I often use to extract data points from graphs and scanned documents.

  1. PlotDigitizer.com: It is free (online) and paid (offline) and supports many graphs. It also supports the logarithmic scale, like the one in your graph.
  2. WebPlotDigitizer: It is also a very popular tool and completely free. But sometimes, I find is buggy and glitchy.
  3. Digitizeit: It a paid tool and has no online version.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文