如何将两个具有间隙和不同时基的时间序列关联起来?

发布于 2024-10-19 09:45:11 字数 1035 浏览 1 评论 0原文

我有两个 3D 加速度计数据的时间序列,它们具有不同的时基(时钟在不同时间启动,在采样期间有一些非常轻微的蠕变),并且包含许多不同大小的间隙(由于与写入单独的数据相关的延迟)闪存设备)。

我使用的加速度计是便宜的 GCDC X250-2。我以最高增益运行加速度计,因此数据具有显着的本底噪声。

每个时间序列都有大约 200 万个数据点(超过一小时,每秒 512 个样本),并包含大约 500 个感兴趣的事件,其中一个典型事件跨越 100-150 个样本(每个样本 200-300 毫秒)。其中许多事件都受到闪存写入期间数据中断的影响。

因此,这些数据并不原始,甚至也不是很漂亮。但我的眼球检查显示它清楚地包含我感兴趣的信息。(如果需要,我可以发布图。)

加速度计处于相似的环境中,但只是适度耦合,这意味着我可以通过眼睛分辨出哪些事件与每个加速度计相匹配加速度计,但到目前为止我在软件方面还没有成功。由于物理限制,这些设备也安装在不同的方向,其中轴不匹配,但它们尽可能接近正交。因此,例如,对于 3 轴加速度计 A 和 A 来说, B、+Ax 映射到 -By(上下),+Az 映射到 -Bx(左右),+Ay 映射到 -Bz(前后)。

我最初的目标是关联垂直轴上的冲击事件,尽管我最终希望 a) 自动发现轴映射,b) 关联映射的 ace 上的活动,以及 c) 提取两个加速度计之间的行为差​​异(例如扭转)或弯曲)。

时间序列数据的性质使得 Python 的 numpy.correlate() 无法使用。我也看过 R 的 Zoo 包,但没有取得任何进展。我向信号分析的不同领域寻求帮助,但没有取得任何进展。

有人对我能做什么或我应该研究的方法有任何线索吗?

2011 年 2 月 28 日更新:在此处 显示数据示例。

I have two time series of 3D accelerometer data that have different time bases (clocks started at different times, with some very slight creep during the sampling time), as well as containing many gaps of different size (due to delays associated with writing to separate flash devices).

The accelerometers I'm using are the inexpensive GCDC X250-2. I'm running the accelerometers at their highest gain, so the data has a significant noise floor.

The time series each have about 2 million data points (over an hour at 512 samples/sec), and contain about 500 events of interest, where a typical event spans 100-150 samples (200-300 ms each). Many of these events are affected by data outages during flash writes.

So, the data isn't pristine, and isn't even very pretty. But my eyeball inspection shows it clearly contains the information I'm interested in. (I can post plots, if needed.)

The accelerometers are in similar environments but are only moderately coupled, meaning that I can tell by eye which events match from each accelerometer, but I have been unsuccessful so far doing so in software. Due to physical limitations, the devices are also mounted in different orientations, where the axes don't match, but they are as close to orthogonal as I could make them. So, for example, for 3-axis accelerometers A & B, +Ax maps to -By (up-down), +Az maps to -Bx (left-right), and +Ay maps to -Bz (front-back).

My initial goal is to correlate shock events on the vertical axis, though I would eventually like to a) automatically discover the axis mapping, b) correlate activity on the mapped aces, and c) extract behavior differences between the two accelerometers (such as twisting or flexing).

The nature of the times series data makes Python's numpy.correlate() unusable. I've also looked at R's Zoo package, but have made no headway with it. I've looked to different fields of signal analysis for help, but I've made no progress.

Anyone have any clues for what I can do, or approaches I should research?

Update 28 Feb 2011: Added some plots here showing examples of the data.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

梦明 2024-10-26 09:45:11

我对你的问题的解释:给定两个非常长的、嘈杂的时间序列,找到一个将一个信号中的大“颠簸”与另一个信号中的大颠簸相匹配的偏移。

我的建议:对数据进行插值,使其间隔均匀,校正并平滑数据(假设快速振荡的相位无趣),并进行一次一个点的互相关(假设一个小偏移将对齐)数据)。

import numpy
from scipy.ndimage import gaussian_filter
"""
sig1 and sig 2 are assumed to be large, 1D numpy arrays
sig1 is sampled at times t1, sig2 is sampled at times t2
t_start, t_end, is your desired sampling interval
t_len is your desired number of measurements
"""

t = numpy.linspace(t_start, t_end, t_len)
sig1 = numpy.interp(t, t1, sig1)
sig2 = numpy.interp(t, t2, sig2)
#Now sig1 and sig2 are sampled at the same points.

"""
Rectify and smooth, so 'peaks' will stand out.
This makes big assumptions about your data;
these assumptions seem true-ish based on your plots.
"""
sigma = 10 #Tune this parameter to get the right smoothing
sig1, sig2 = abs(sig1), abs(sig2)
sig1, sig2 = gaussian_filter(sig1, sigma), gaussian_filter(sig2, sigma)

"""
Now sig1 and sig2 should look smoothly varying, with humps at each 'event'.
Hopefully we can search a small range of shifts to find the maximum of the 
cross-correlation. This assumes your data are *nearly* lined up already.
"""
max_xc = 0
best_shift = 0
for shift in range(-10, 10): #Tune this search range
    xc = (numpy.roll(sig1, shift) * sig2).sum()
    if xc > max_xc:
        max_xc = xc
        best_shift = shift
print 'Best shift:', best_shift
"""
If best_shift is at the edges of your search range,
you should expand the search range.
"""

My interpretation of your question: Given two very long, noisy time series, find a shift of one that matches large 'bumps' in one signal to large bumps in the other signal.

My suggestion: interpolate the data so it's uniformly spaced, rectify and smooth the data (assuming the phase of the fast oscillations is uninteresting), and do a one-point-at-a-time cross correlation (assuming a small shift will line up the data).

import numpy
from scipy.ndimage import gaussian_filter
"""
sig1 and sig 2 are assumed to be large, 1D numpy arrays
sig1 is sampled at times t1, sig2 is sampled at times t2
t_start, t_end, is your desired sampling interval
t_len is your desired number of measurements
"""

t = numpy.linspace(t_start, t_end, t_len)
sig1 = numpy.interp(t, t1, sig1)
sig2 = numpy.interp(t, t2, sig2)
#Now sig1 and sig2 are sampled at the same points.

"""
Rectify and smooth, so 'peaks' will stand out.
This makes big assumptions about your data;
these assumptions seem true-ish based on your plots.
"""
sigma = 10 #Tune this parameter to get the right smoothing
sig1, sig2 = abs(sig1), abs(sig2)
sig1, sig2 = gaussian_filter(sig1, sigma), gaussian_filter(sig2, sigma)

"""
Now sig1 and sig2 should look smoothly varying, with humps at each 'event'.
Hopefully we can search a small range of shifts to find the maximum of the 
cross-correlation. This assumes your data are *nearly* lined up already.
"""
max_xc = 0
best_shift = 0
for shift in range(-10, 10): #Tune this search range
    xc = (numpy.roll(sig1, shift) * sig2).sum()
    if xc > max_xc:
        max_xc = xc
        best_shift = shift
print 'Best shift:', best_shift
"""
If best_shift is at the edges of your search range,
you should expand the search range.
"""
假情假意假温柔 2024-10-26 09:45:11

如果数据包含每个时间序列中不同的未知大小的间隙,那么我会放弃尝试关联整个序列,而是尝试对每个时间序列上的短窗口对进行交叉关联,例如重叠窗口长度的两倍典型事件(300 个样本长)。在所有可能性中查找潜在的高互相关匹配,然后对潜在匹配施加顺序排序约束以获得匹配窗口的序列。

从那里你可以得到更容易分析的更小的问题。

If the data contains gaps of unknown sizes that are different in each time series, then I would give up on trying to correlate entire sequences, and instead try cross correlating pairs of short windows on each time series, say overlapping windows twice the length of a typical event (300 samples long). Find potential high cross correlation matches across all possibilities, and then impose a sequential ordering constraint on the potential matches to get sequences of matched windows.

From there you have smaller problems that are easier to analyze.

黯然 2024-10-26 09:45:11

这不是一个技术答案,但它可能会帮助你想出一个:

  • 将绘图转换为图像,并将其粘贴到像gimp或photoshop这样的像样的图像程序中,
  • 每当有间隙时将绘图分成离散
  • 图像水平线上的一系列图
  • 将第二个系列放在其正下方的水平线上,
  • 可以直观地识别第一个相关事件
  • 如果两个事件未垂直排列,则 :
    • 选择该行中最左边的实例及其右边的所有内容
    • 将这些东西拖到右侧,直到它们对齐

这几乎就是音频编辑器的工作原理,因此,如果您将其转换为简单的音频格式,如未压缩的 WAV 文件,您可以直接在 Audacity 等中操作它​​。 (当然,这听起来很可怕,但您将能够非常轻松地移动数据图。)

实际上,audacity 也有一种称为 nyquist 的脚本语言,因此如果您不需要该程序检测相关性(或者您至少愿意暂时推迟该步骤)您可能可以使用 audacity 标记和奈奎斯特的某种组合来自动对齐并以您选择的格式导出干净的数据一旦你标记了相关点。

This isn't a technical answer, but it might help you come up with one:

  • Convert the plot to an image, and stick it into a decent image program like gimp or photoshop
  • break the plots into discrete images whenever there's a gap
  • put the first series of plots in a horizontal line
  • put the second series in a horizontal line right underneath it
  • visually identify the first correlated event
  • if the two events are not lined up vertically:
    • select whichever instance is further to the left and everything to the right of it on that row
    • drag those things to the right until they line up

This is pretty much how an audio editor works, so you if you converted it into a simple audio format like an uncompressed WAV file, you could manipulate it directly in something like Audacity. (It'll sound horrible, of course, but you'll be able to move the data plots around pretty easily.)

Actually, audacity has a scripting language called nyquist, too, so if you don't need the program to detect the correlations (or you're at least willing to defer that step for the time being) you could probably use some combination of audacity's markers and nyquist to automate the alignment and export the clean data in your format of choice once you tag the correlation points.

别再吹冷风 2024-10-26 09:45:11

我的猜测是,您必须手动构建一个偏移表来对齐系列之间的“匹配”。下面是获取这些匹配项的方法示例。这个想法是左右移动数据直到它对齐,然后调整比例直到它“匹配”。尝试一下。

library(rpanel)

#Generate the x1 and x2 data
n1 <- rnorm(500)
n2 <- rnorm(200)
x1 <- c(n1, rep(0,100), n2, rep(0,150))
x2 <- c(rep(0,50), 2*n1, rep(0,150), 3*n2, rep(0,50))

#Build the panel function that will draw/update the graph
lvm.draw <- function(panel) {
       plot(x=(1:length(panel$dat3))+panel$off, y=panel$dat3, ylim=panel$dat1, xlab="", ylab="y", main=paste("Alignment Graph   Offset = ", panel$off, "   Scale = ", panel$sca, sep=""), typ="l")
       lines(x=1:length(panel$dat3), y=panel$sca*panel$dat4, col="red")
       grid()
       panel
}

#Build the panel
xlimdat <- c(1, length(x1))
ylimdat <- c(-5, 5)
panel <- rp.control(title = "Eye-Ball-It", dat1=ylimdat, dat2=xlimdat, dat3=x1, dat4=x2, off=100, sca=1.0, size=c(300, 160))
rp.slider(panel, var=off, from=-500, to=500, action=lvm.draw, title="Offset", pos=c(5, 5, 290, 70), showvalue=TRUE)
rp.slider(panel, var=sca, from=0, to=2, action=lvm.draw, title="Scale", pos=c(5, 70, 290, 90), showvalue=TRUE)

My guess is, you'll have to manually build an offset table that aligns the "matches" between the series. Below is an example of a way to get those matches. The idea is to shift the data left-right until it lines up and then adjust the scale until it "matches". Give it a try.

library(rpanel)

#Generate the x1 and x2 data
n1 <- rnorm(500)
n2 <- rnorm(200)
x1 <- c(n1, rep(0,100), n2, rep(0,150))
x2 <- c(rep(0,50), 2*n1, rep(0,150), 3*n2, rep(0,50))

#Build the panel function that will draw/update the graph
lvm.draw <- function(panel) {
       plot(x=(1:length(panel$dat3))+panel$off, y=panel$dat3, ylim=panel$dat1, xlab="", ylab="y", main=paste("Alignment Graph   Offset = ", panel$off, "   Scale = ", panel$sca, sep=""), typ="l")
       lines(x=1:length(panel$dat3), y=panel$sca*panel$dat4, col="red")
       grid()
       panel
}

#Build the panel
xlimdat <- c(1, length(x1))
ylimdat <- c(-5, 5)
panel <- rp.control(title = "Eye-Ball-It", dat1=ylimdat, dat2=xlimdat, dat3=x1, dat4=x2, off=100, sca=1.0, size=c(300, 160))
rp.slider(panel, var=off, from=-500, to=500, action=lvm.draw, title="Offset", pos=c(5, 5, 290, 70), showvalue=TRUE)
rp.slider(panel, var=sca, from=0, to=2, action=lvm.draw, title="Scale", pos=c(5, 70, 290, 90), showvalue=TRUE)
ㄟ。诗瑗 2024-10-26 09:45:11

听起来您想最小化一对值的函数 (Ax'+By) + (Az'+Bx) + (Ay'+Bz):即时间偏移: t0 和时间比例因子:tr。其中 Ax' = tr*(Ax + t0) 等。

我会研究 SciPy 的双变量 优化 函数。我会使用 掩码 或暂时将数据归零(例如 Ax' 和 By)超过“间隙”(假设间隙可以通过编程方式确定)。

为了使过程更加高效,请从 A 和 B 的粗略采样开始,但在 fmin(或您选择的任何优化器)中设置与采样相称的精度。然后继续对完整数据集进行逐渐更精细的采样窗口,直到窗口变窄并且不再进行下采样。

编辑 - 匹配轴

关于尝试识别哪个轴与给定轴共线且不了解数据特征的问题,我可以指出类似的问题。查看 pHash这篇帖子可帮助识别相似的波形。

It sounds like you want to minimize the function (Ax'+By) + (Az'+Bx) + (Ay'+Bz) for a pair of values: Namely, the time-offset: t0 and a time scale factor: tr. where Ax' = tr*(Ax + t0), etc..

I would look into SciPy's bivariate optimize functions. And I would use a mask or temporarily zero the data (both Ax' and By for example) over the "gaps" (assuming the gaps can be programmatically determined).

To make the process more efficient, start with a coarse sampling of A and B, but set the precision in fmin (or whatever optimizer you've selected) that is commensurate with your sampling. Then proceed with progressively finer-sampled windows of the full dataset until your windows are narrow and are not down-sampled.

Edit - matching axes

Regarding the issue of trying to identify which axis is co-linear with a given axis, and not knowing at thing about the characteristics of your data, i can point towards a similar question. Look into pHash or any of the other methods outlined in this post to help identify similar waveforms.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文