如何将两个具有间隙和不同时基的时间序列关联起来?
我有两个 3D 加速度计数据的时间序列,它们具有不同的时基(时钟在不同时间启动,在采样期间有一些非常轻微的蠕变),并且包含许多不同大小的间隙(由于与写入单独的数据相关的延迟)闪存设备)。
我使用的加速度计是便宜的 GCDC X250-2。我以最高增益运行加速度计,因此数据具有显着的本底噪声。
每个时间序列都有大约 200 万个数据点(超过一小时,每秒 512 个样本),并包含大约 500 个感兴趣的事件,其中一个典型事件跨越 100-150 个样本(每个样本 200-300 毫秒)。其中许多事件都受到闪存写入期间数据中断的影响。
因此,这些数据并不原始,甚至也不是很漂亮。但我的眼球检查显示它清楚地包含我感兴趣的信息。(如果需要,我可以发布图。)
加速度计处于相似的环境中,但只是适度耦合,这意味着我可以通过眼睛分辨出哪些事件与每个加速度计相匹配加速度计,但到目前为止我在软件方面还没有成功。由于物理限制,这些设备也安装在不同的方向,其中轴不匹配,但它们尽可能接近正交。因此,例如,对于 3 轴加速度计 A 和 A 来说, B、+Ax 映射到 -By(上下),+Az 映射到 -Bx(左右),+Ay 映射到 -Bz(前后)。
我最初的目标是关联垂直轴上的冲击事件,尽管我最终希望 a) 自动发现轴映射,b) 关联映射的 ace 上的活动,以及 c) 提取两个加速度计之间的行为差异(例如扭转)或弯曲)。
时间序列数据的性质使得 Python 的 numpy.correlate() 无法使用。我也看过 R 的 Zoo 包,但没有取得任何进展。我向信号分析的不同领域寻求帮助,但没有取得任何进展。
有人对我能做什么或我应该研究的方法有任何线索吗?
2011 年 2 月 28 日更新:在此处 显示数据示例。
I have two time series of 3D accelerometer data that have different time bases (clocks started at different times, with some very slight creep during the sampling time), as well as containing many gaps of different size (due to delays associated with writing to separate flash devices).
The accelerometers I'm using are the inexpensive GCDC X250-2. I'm running the accelerometers at their highest gain, so the data has a significant noise floor.
The time series each have about 2 million data points (over an hour at 512 samples/sec), and contain about 500 events of interest, where a typical event spans 100-150 samples (200-300 ms each). Many of these events are affected by data outages during flash writes.
So, the data isn't pristine, and isn't even very pretty. But my eyeball inspection shows it clearly contains the information I'm interested in. (I can post plots, if needed.)
The accelerometers are in similar environments but are only moderately coupled, meaning that I can tell by eye which events match from each accelerometer, but I have been unsuccessful so far doing so in software. Due to physical limitations, the devices are also mounted in different orientations, where the axes don't match, but they are as close to orthogonal as I could make them. So, for example, for 3-axis accelerometers A & B, +Ax maps to -By (up-down), +Az maps to -Bx (left-right), and +Ay maps to -Bz (front-back).
My initial goal is to correlate shock events on the vertical axis, though I would eventually like to a) automatically discover the axis mapping, b) correlate activity on the mapped aces, and c) extract behavior differences between the two accelerometers (such as twisting or flexing).
The nature of the times series data makes Python's numpy.correlate() unusable. I've also looked at R's Zoo package, but have made no headway with it. I've looked to different fields of signal analysis for help, but I've made no progress.
Anyone have any clues for what I can do, or approaches I should research?
Update 28 Feb 2011: Added some plots here showing examples of the data.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我对你的问题的解释:给定两个非常长的、嘈杂的时间序列,找到一个将一个信号中的大“颠簸”与另一个信号中的大颠簸相匹配的偏移。
我的建议:对数据进行插值,使其间隔均匀,校正并平滑数据(假设快速振荡的相位无趣),并进行一次一个点的互相关(假设一个小偏移将对齐)数据)。
My interpretation of your question: Given two very long, noisy time series, find a shift of one that matches large 'bumps' in one signal to large bumps in the other signal.
My suggestion: interpolate the data so it's uniformly spaced, rectify and smooth the data (assuming the phase of the fast oscillations is uninteresting), and do a one-point-at-a-time cross correlation (assuming a small shift will line up the data).
如果数据包含每个时间序列中不同的未知大小的间隙,那么我会放弃尝试关联整个序列,而是尝试对每个时间序列上的短窗口对进行交叉关联,例如重叠窗口长度的两倍典型事件(300 个样本长)。在所有可能性中查找潜在的高互相关匹配,然后对潜在匹配施加顺序排序约束以获得匹配窗口的序列。
从那里你可以得到更容易分析的更小的问题。
If the data contains gaps of unknown sizes that are different in each time series, then I would give up on trying to correlate entire sequences, and instead try cross correlating pairs of short windows on each time series, say overlapping windows twice the length of a typical event (300 samples long). Find potential high cross correlation matches across all possibilities, and then impose a sequential ordering constraint on the potential matches to get sequences of matched windows.
From there you have smaller problems that are easier to analyze.
这不是一个技术答案,但它可能会帮助你想出一个:
这几乎就是音频编辑器的工作原理,因此,如果您将其转换为简单的音频格式,如未压缩的 WAV 文件,您可以直接在 Audacity 等中操作它。 (当然,这听起来很可怕,但您将能够非常轻松地移动数据图。)
实际上,audacity 也有一种称为 nyquist 的脚本语言,因此如果您不需要该程序检测相关性(或者您至少愿意暂时推迟该步骤)您可能可以使用 audacity 标记和奈奎斯特的某种组合来自动对齐并以您选择的格式导出干净的数据一旦你标记了相关点。
This isn't a technical answer, but it might help you come up with one:
This is pretty much how an audio editor works, so you if you converted it into a simple audio format like an uncompressed WAV file, you could manipulate it directly in something like Audacity. (It'll sound horrible, of course, but you'll be able to move the data plots around pretty easily.)
Actually, audacity has a scripting language called nyquist, too, so if you don't need the program to detect the correlations (or you're at least willing to defer that step for the time being) you could probably use some combination of audacity's markers and nyquist to automate the alignment and export the clean data in your format of choice once you tag the correlation points.
我的猜测是,您必须手动构建一个偏移表来对齐系列之间的“匹配”。下面是获取这些匹配项的方法示例。这个想法是左右移动数据直到它对齐,然后调整比例直到它“匹配”。尝试一下。
My guess is, you'll have to manually build an offset table that aligns the "matches" between the series. Below is an example of a way to get those matches. The idea is to shift the data left-right until it lines up and then adjust the scale until it "matches". Give it a try.
听起来您想最小化一对值的函数 (Ax'+By) + (Az'+Bx) + (Ay'+Bz):即时间偏移: t0 和时间比例因子:tr。其中 Ax' = tr*(Ax + t0) 等。
我会研究 SciPy 的双变量 优化 函数。我会使用 掩码 或暂时将数据归零(例如 Ax' 和 By)超过“间隙”(假设间隙可以通过编程方式确定)。
为了使过程更加高效,请从 A 和 B 的粗略采样开始,但在
fmin
(或您选择的任何优化器)中设置与采样相称的精度。然后继续对完整数据集进行逐渐更精细的采样窗口,直到窗口变窄并且不再进行下采样。编辑 - 匹配轴
关于尝试识别哪个轴与给定轴共线且不了解数据特征的问题,我可以指出类似的问题。查看 pHash 或 这篇帖子可帮助识别相似的波形。
It sounds like you want to minimize the function (Ax'+By) + (Az'+Bx) + (Ay'+Bz) for a pair of values: Namely, the time-offset: t0 and a time scale factor: tr. where Ax' = tr*(Ax + t0), etc..
I would look into SciPy's bivariate optimize functions. And I would use a mask or temporarily zero the data (both Ax' and By for example) over the "gaps" (assuming the gaps can be programmatically determined).
To make the process more efficient, start with a coarse sampling of A and B, but set the precision in
fmin
(or whatever optimizer you've selected) that is commensurate with your sampling. Then proceed with progressively finer-sampled windows of the full dataset until your windows are narrow and are not down-sampled.Edit - matching axes
Regarding the issue of trying to identify which axis is co-linear with a given axis, and not knowing at thing about the characteristics of your data, i can point towards a similar question. Look into pHash or any of the other methods outlined in this post to help identify similar waveforms.