科学数据处理(图形比较和解释)

发布于 2024-09-03 14:32:25 字数 630 浏览 6 评论 0原文

我正在尝试编写一个程序来自动执行一项比较无聊和重复的工作任务。我有一些编程经验,但没有处理或解释大量数据的经验,因此我正在寻求您的建议(既包括要尝试的技术建议,也包括要阅读的内容以了解有关执行此操作的更多信息)。

我有一台设备,可以通过重复采样来监控实验,并将读数以图表形式显示在屏幕上。实验的输入可以改变,其中一个变化应该会导致我目前通过眼睛识别的图表部分发生变化,这就是我在实验中寻找的内容。我想将其自动化,以便计算机查看一组结果并发现导致变化的实验输入。

我已经可以从机器中提取结果了。目前,它们的运行结果采用整数数组的形式,索引是样本编号,相应的值是测量值。

每次实验运行的图表的整体形状都是相似的。我正在寻找的变化将大致相同,并且每次为了正确的实验输入都会发生在大致相同的位置。不幸的是,有一些问题使这个问题变得更加困难。

  1. 测量过程中存在一些噪音,这意味着不同运行之间的测量值存在一些随机变化。尽管图表的整体形状保持不变。

  2. 每次运行实验所需的时间都会略有不同,从而导致两种影响。首先,整个图可能在 x 轴上相对于另一次运行的图略有移动。其次,在不同的运行中,各个特征可能会显得稍微更宽或更窄。

在这两种情况下,变化都不是特别大,您可以假设唯一的非随机变化是由找到的正确输入引起的。

I'm trying to write a program to automate one of my more boring and repetitive work tasks. I have some programming experience but none with processing or interpreting large volumes of data so I am seeking your advice (both suggestions of techniques to try and also things to read to learn more about doing this stuff).

I have a piece of equipment that monitors an experiment by taking repeated samples and displays the readings on its screen as a graph. The input of experiment can be altered and one of these changes should produce a change in a section of the graph which I currently identify by eye and is what I'm looking for in the experiment. I want to automate it so that a computer looks at a set of results and spots the experiment input that causes the change.

I can already extract the results from the machine. Currently they results for a run are in the form of an integer array with the index being the sample number and the corresponding value being the measurement.

The overall shape of the graph will be similar for each experiment run. The change I'm looking for will be roughly the same and will occur in approximately the same place every time for the correct experiment input. Unfortunately there are a few gotchas that make this problem more difficult.

  1. There is some noise in the measuring process which mean there is some random variation in the measured values between different runs. Although the overall shape of the graph remains the same.

  2. The time the experiment takes varies slightly each run causing two effects. First, the a whole graph may be shifted slightly on the x axis relative to another run's graph. Second, individual features may appear slightly wider or narrower in different runs.

In both these cases the variation isn't particularly large and you can assume that the only non random variation is caused by the correct input being found.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

脸赞 2024-09-10 14:32:25

我认为您正在寻找有关数字信号处理的信息。它的范围可以从非常简单到非常难以理解。比如说,如果您的事件前信号为 0,而相关信号之后的每个信号均为 1,您只需查找第一个 1,找出它发生的时间,然后就完成了。这基本上是简单性的极限情况,它可能是一个很好的起点。实现这一点,你就开始知道如何回答你的问题了。那么现在,你就有噪音了。因此,比如说,事件前的范围可能是 -10 到 10,事件后的范围可能是 90 到 110。仍然很简单;注意第一个大于 10 的值。但当然事情从来没有那么简单。您可能必须对读数窗口进行平均,可能会寻找与先前测量相比的某些变化阈值等。在高级情况下,您可能会发现自己使用到其他空间的转换、应用过滤器、模式匹配等。但从你的描述来看,听起来相当简单的方法应该可以为你完成这项工作。不要被诸如 FFT 这样的概念吓到 - 您可能还不需要它们。至少现在,假设它可以简单地解决。从一个非常简单(但不够)的解决方案开始,然后逐步找到可行的解决方案。

I think you're looking for information on Digital Signal Processing. It can range from very simple to very hard to understand. If, say, your pre-event signal was 0, and every signal after the relevant signal was 1, you could just look for the first 1, figure out the time at which it occurred, and you'd be done. That's basically the limiting case of simplicity, and it might be a good place to start. Implement that, and you've got the beginnings of a sense of how to answer your question. Now, then, you've got noise. So, say, pre-event might range from -10 to 10, and post-event might range from 90 to 110. Still simple; watch for the first value greater than 10. But of course it's never that simple. You might have to average a window of readings, might look for some threshold of change from previous measurement, etc. In advanced cases, you could find yourself using transformations into other spaces, applying filters, pattern matching, and the like. But from your description, it sounds like reasonably simple methods should do the job for you. Don't get intimidated by concepts like FFT - you probably don't need them, yet. For now, at least, assume that it can be solved simply. Start with a trivially simple (but insufficient) solution, and work your way towards the solution that works.

べ繥欢鉨o。 2024-09-10 14:32:25

如果 Carl 建议的那种过滤和阈值方法还不够,则值得研究的一种技术是 互相关。其本质非常简单:如果两个数据集相当相似,当它们对齐时它们的点积将最大化(因为最高的值将相乘)。因此,您可以通过在每个偏移处计算此乘积并选择给出最高结果的乘积来很好地估计如何排列它们。

在像您这样的情况下,我们的想法是拥有您正在寻找的曲线形状的“理想”版本——要么通过理论/模拟生成,要么通过对由以下方法识别和对齐的许多良好实验曲线的结果进行平均生成眼睛——并将其与实验数据进行比较。

为简单起见,我们假设数据集比理想数据集长,并且两端都有足够的空白空间,因此我们可以忽略任何边界问题。由于您正在寻找一个特定事件,因此减少您的理想以符合这一假设应该是微不足道的。如果用 Java 进行粗略编码,该过程可能会是这样的:

int offset ( double[] data, double[] ideal )
{
    double cMax = -Double.MAX_VALUE;
    int tMax = 0;

    for ( int t = 0; t < data.length - ideal.length; ++t )
    {
        double c = 0;
        for ( int i = 0; i < ideal.length; ++i )
        {
            c += data[t + i] * ideal[i];
        }

        if ( c > cMax )
        {
            cMax = c;
            tMax = t;
        }
    }

    return tMax;
 }

显然,在很多情况下这种方法可能会失败,特别是如果存在大量非独立噪声,或者信号中存在周期性,引起别名。此外,此示例丢弃了大量信息,只关注绝对最大值,如果互相关中没有大而窄的峰值,则可能容易出错。但从你的描述来看,你的问题似乎可以通过这些方式解决。

One technique worth looking at if the sort of filter-and-threshold approach Carl suggests won't suffice is Cross Correlation. The essence of this is pretty simple: if two data sets are reasonably similar, their dot product will be maximimised when they align (because the highest values will be multiplied together). So you can get a good estimate of how to line them up by calculating this product at each offset and choosing the one that gives the highest result.

In a case like yours, the idea would be to have an "ideal" version of the curve shape you're looking for -- either generated from theory/simulation or by averaging the results of a number of good experimental curves identified and aligned by eye -- and compare it against the experimental data.

For simplicity, let's assume that the data set is longer than the ideal and has enough empty space at either end that we can ignore any boundary issues. Since you are looking for one specific event, it should be trivial to cut down your ideal to comply with this assumption. Crudely coded in Java, then, the process might go something like this:

int offset ( double[] data, double[] ideal )
{
    double cMax = -Double.MAX_VALUE;
    int tMax = 0;

    for ( int t = 0; t < data.length - ideal.length; ++t )
    {
        double c = 0;
        for ( int i = 0; i < ideal.length; ++i )
        {
            c += data[t + i] * ideal[i];
        }

        if ( c > cMax )
        {
            cMax = c;
            tMax = t;
        }
    }

    return tMax;
 }

Obviously, there are plenty of situations in which this approach can fail, particularly if there is a significant amount of non-independent noise or if there are periodicities in the signal that give rise to aliasing. Also, this example throws away a lot of information to focus just on an absolute maximum, which may be error-prone if there isn't a large, narrow peak in the cross correlation. But from your description it seems like your problem could be fairly amenable to something along these lines.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文