从多变量时间序列间隔中检测罕见事件

发布于 2024-09-26 08:19:51 字数 2786 浏览 8 评论 0原文

给定传感器状态间隔的时间序列，如何实现从监督训练数据中学习的分类器，以根据状态间隔序列检测事件？为了简化问题，传感器状态被简化为 true 或 false。

更新：我找到了这篇论文 (PDF) 关于时间间隔的挖掘序列，它解决了类似的问题。另一篇关于挖掘多元时间序列中的分层时间模式的论文（Google Docs）采用了一种新颖的方法，但处理的是分层数据。

训练数据示例

以下数据是事件的训练示例，以随时间变化的图表形式表示，其中 /́́́\ 表示 true 状态区间，\___/ 传感器的 false 状态间隔。

 Sensor   |  Sensor State over time
          |  0....5....10...15...20...25...  // timestamp
 ---------|--------------------------------
 A        |  ¯¯¯¯¯¯¯¯¯¯¯¯\________/¯¯¯¯¯¯¯¯
 B        |  ¯¯¯¯¯\___________________/¯¯¯¯
 C        |  ______________________________  // no state change
 D        |  /¯\_/¯\_/¯\_/¯\_/¯\_/¯\_/¯\_/¯
 E        |  _________________/¯¯¯¯¯¯¯¯\___

事件检测与序列标记与分类

我最初将我的问题概括为二类序列标记问题，但我的类别实际上代表“正常操作”和罕见的“警报事件”，因此我将我的问题改写为事件检测。训练数据可用于“正常操作”和“报警事件”。

为了降低问题的复杂性，我将传感器事件离散化为布尔值，但情况不一定如此。

可能的算法

隐马尔可夫模型似乎是一个可能的解决方案，但它能够使用状态区间吗？如果序列标记器不是解决此问题的最佳方法，我们将不胜感激其他建议。

贝叶斯概率方法

传感器活动会因一天中的不同时间而显着变化（早上忙碌，晚上安静）。我最初的方法是测量几天内的正常传感器状态，并按一天中的时间（小时）计算状态概率。传感器状态在不太可能的时刻超过“不可能阈值”的组合概率将表明发生了事件。但如果传感器有噪音，这似乎会引发误报。我还没有实施这个，但我相信这种方法有其优点。

特征提取

向量状态可以表示为在特定时间发生并持续特定持续时间的状态间隔变化。

struct StateInterval
{
    int sensorID;
    bool state;
    DateTime timeStamp;
    TimeSpan duration; 
}

例如。过程表中的一些状态间隔：

[ {D, true, 0, 3} ]; [ {D, false, 4, 1} ]; ...
[ {A, true, 0, 12} ]; [ {B, true, 0, 6} ]; [ {D, true, 0, 3} ]; etc.

一个好的分类器会考虑状态值间隔和最近的状态更改，以确定状态更改的组合是否与类别的训练数据紧密匹配。

编辑：睡后关于如何从多个传感器的警报数据中提取特征以及如何将其与之前的数据进行比较的一些想法...

首先计算每个传感器一天中每个小时的以下数据：

平均状态间隔长度（对于 true 和 false 状态）
状态更改之间的平均时间
随时间变化的状态数量

然后可以将每个传感器与矩阵中的每个其他传感器进行比较数据如下：

传感器 B 在传感器 A 变为真实状态后所花费的平均时间。如果平均值为 60 秒，那么 1 秒的等待会比 120 秒的等待更有趣。
当传感器 A 处于一种状态时，传感器 B 经历的状态变化的平均次数

给定两组训练数据，分类器应该能够从这些特征集中确定哪一个是最有可能进行分类的类别。

这是一种明智的方法吗？比较这些特征的好算法是什么？

编辑：状态变化的方向（false->true vs true-false）很重要，因此任何功能都应该考虑到这一点帐户。

原文

Given a time series of sensor state intervals, how do I implement a classifier which learns from supervised training data to detect an incident based on a sequence of state intervals? To simplify the problem, sensor states are reduced to either true or false.

Update: I've found this paper (PDF) on Mining Sequences of Temporal Intervals which addresses a similar problem. Another paper (Google Docs) on Mining Hierarchical Temporal Patterns in Multivariate Time Series takes a novel approach, but deals with hierarchical data.

Example Training Data

The following data is a training example for an incident, represented as a graph over time, where /¯¯¯\ represents a true state interval and \___/ a false state interval for a sensor.

 Sensor   |  Sensor State over time
          |  0....5....10...15...20...25...  // timestamp
 ---------|--------------------------------
 A        |  ¯¯¯¯¯¯¯¯¯¯¯¯\________/¯¯¯¯¯¯¯¯
 B        |  ¯¯¯¯¯\___________________/¯¯¯¯
 C        |  ______________________________  // no state change
 D        |  /¯\_/¯\_/¯\_/¯\_/¯\_/¯\_/¯\_/¯
 E        |  _________________/¯¯¯¯¯¯¯¯\___

Incident Detection vs Sequence Labeling vs Classification

I initially generalised my problem as a two-category sequence labeling problem, but my categories really represented "normal operation" and a rare "alarm event" so I have rephrased my question as incident detection. Training data is available for "normal operation" and "alarm incident".

To reduce problem complexity, I have discretized sensor events to boolean values, but this need not be the case.

Possible Algorithms

A hidden Markov model seems to be a possible solution, but would it be able to use the state intervals? If a sequence labeler is not the best approach for this problem, alternative suggestions would be appreciated.

Bayesian Probabilistic Approach

Sensor activity will vary significantly by time of day (busy in mornings, quiet at night). My initial approach would have been to measure normal sensor state over a few days and calculate state probability by time of day (hour). The combined probability of sensor states at an unlikely hour surpassing an "unlikelihood threshold" would indicate an incident. But this seemed like it would raise a false alarm if the sensors were noisy. I have not yet implemented this, but I believe that approach has merit.

Feature Extraction

Vector states could be represented as state interval changes occurring at a specific time and lasting a specific duration.

struct StateInterval
{
    int sensorID;
    bool state;
    DateTime timeStamp;
    TimeSpan duration; 
}

eg. Some State Intervals from the process table:

[ {D, true, 0, 3} ]; [ {D, false, 4, 1} ]; ...
[ {A, true, 0, 12} ]; [ {B, true, 0, 6} ]; [ {D, true, 0, 3} ]; etc.

A good classifier would take into account state-value intervals and recent state changes to determine if a combination of state changes closely matches training data for a category.

Edit: Some ideas after sleeping on how to extract features from multiple sensors' alarm data and how to compare it to previous data...

Start by calculating the following data for each sensor for each hour of the day:

Average state interval length (for true and false states)
Average time between state changes
Number of state changes over time

Each sensor could then be compared to every other sensor in a matrix with data like the following:

Average time taken for sensor B to change to a true state after sensor A did. If an average value is 60 seconds, then a 1-second wait would be more interesting than a 120-second wait.
Average number of state changes sensor B underwent while sensor A was in one state

Given two sets of training data, the classifier should be able to determine from these feature sets which is the most likely category for classification.

Is this a sensible approach and what would be a good algorithm to compare these features?

Edit: the direction of a state change (false->true vs true-false) is significant, so any features should take that into account.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

橘和柠 2024-10-03 08:19:51

一种简单的解决方案是折叠数据的时间方面，并将每个时间戳作为一个实例。在这种情况下，传感器的值被视为特征向量，其中每个时间步都标有类别 A 或 B 的类别值（至少对于标记的训练数据）：

   sensors      | class
A  B  C  D  E   |
------------------------- 
1  1  1  0  0   |  catA
1  0  0  0  0   |  catB
1  1  0  1  0   |  catB
1  1  0  0  0   |  catA
..

此输入数据被馈送到通常的分类算法（ANN，SVM，...），目标是预测未标记时间序列的类别：

   sensors      | class
A  B  C  D  E   |
------------------------- 
0  1  1  1  1   |   ?
1  1  0  0  0   |   ?
..

降维/特征提取的中间步骤可以改善结果。

显然，这可能不如对序列的时间动态进行建模那么好，特别是因为隐马尔可夫模型（HMM）等技术考虑了各种状态之间的转换。

编辑

根据您下面的评论，似乎获得目标类的短暂预测的最佳方法是在预测阶段结束时应用后处理规则，并处理分类输出为连续预测序列。

它的工作原理是计算类后验概率（即：实例属于每个类标签的概率分布，在二元 SVM 的情况下很容易从决策函数导出），然后给定指定的阈值，您检查预测类的概率是否高于该阈值：如果是，我们使用该类来预测当前时间戳，如果不是，则我们保留之前的预测，对于未来的实例也是如此。这具有为当前预测添加一定惯性的效果。

A simple solution would be collapse the time aspect of your data and take each timestamp as one instance. In this case, the values of the sensors are considered your feature vector, where each time step is labeled with a class value of category A or B (at least for the labeled training data):

   sensors      | class
A  B  C  D  E   |
------------------------- 
1  1  1  0  0   |  catA
1  0  0  0  0   |  catB
1  1  0  1  0   |  catB
1  1  0  0  0   |  catA
..

This input data is fed to the usual classification algorithms (ANN, SVM, ...), and the goal is to predict the class of unlabeled time series:

   sensors      | class
A  B  C  D  E   |
------------------------- 
0  1  1  1  1   |   ?
1  1  0  0  0   |   ?
..

An intermediary step of dimensionality reduction / feature extraction could improve the results.

Obviously this may not be as good as modeling the time dynamics of the sequences, especially since techniques such as Hidden Markov Models (HMM) take into account the transitions between the various states.

EDIT

Based on your comment below, it seems that the best way to get less transitory predictions of the target class is to a apply a post-processing rule at the end of the prediction phase, and treating the classification output as a sequence of consecutive predictions.

The way this works is that you would compute the class posterior probabilities (ie: probability distribution that an instance belong to each class label, which in the case of binary SVM are easily derived from the decision function), then given a specified threshold, you check if the probability of the predicted class is above that threshold: if it is we use that class to predict the current timestamp, if not then we keep the previous prediction, and the same goes for future instances. This has the effect of adding a certain inertia to the current prediction.

回复收藏 0 原文