从多变量时间序列间隔中检测罕见事件
给定传感器状态间隔的时间序列,如何实现从监督训练数据中学习的分类器,以根据状态间隔序列检测事件?为了简化问题,传感器状态被简化为 true
或 false
。
更新:我找到了这篇论文 (PDF) 关于时间间隔的挖掘序列,它解决了类似的问题。 另一篇关于挖掘多元时间序列中的分层时间模式的论文(Google Docs)采用了一种新颖的方法,但处理的是分层数据。
训练数据示例
以下数据是事件的训练示例,以随时间变化的图表形式表示,其中 /́́́\
表示 true
状态区间,\___/
传感器的 false
状态间隔。
Sensor | Sensor State over time
| 0....5....10...15...20...25... // timestamp
---------|--------------------------------
A | ¯¯¯¯¯¯¯¯¯¯¯¯\________/¯¯¯¯¯¯¯¯
B | ¯¯¯¯¯\___________________/¯¯¯¯
C | ______________________________ // no state change
D | /¯\_/¯\_/¯\_/¯\_/¯\_/¯\_/¯\_/¯
E | _________________/¯¯¯¯¯¯¯¯\___
事件检测与序列标记与分类
我最初将我的问题概括为二类序列标记问题,但我的类别实际上代表“正常操作”和罕见的“警报事件”,因此我将我的问题改写为事件检测。训练数据可用于“正常操作”和“报警事件”。
为了降低问题的复杂性,我将传感器事件离散化为布尔值,但情况不一定如此。
可能的算法
隐马尔可夫模型似乎是一个可能的解决方案,但它能够使用状态区间吗?如果序列标记器不是解决此问题的最佳方法,我们将不胜感激其他建议。
贝叶斯概率方法
传感器活动会因一天中的不同时间而显着变化(早上忙碌,晚上安静)。我最初的方法是测量几天内的正常传感器状态,并按一天中的时间(小时)计算状态概率。传感器状态在不太可能的时刻超过“不可能阈值”的组合概率将表明发生了事件。但如果传感器有噪音,这似乎会引发误报。我还没有实施这个,但我相信这种方法有其优点。
特征提取
向量状态可以表示为在特定时间发生并持续特定持续时间的状态间隔变化。
struct StateInterval
{
int sensorID;
bool state;
DateTime timeStamp;
TimeSpan duration;
}
例如。过程表中的一些状态间隔:
[ {D, true, 0, 3} ]; [ {D, false, 4, 1} ]; ...
[ {A, true, 0, 12} ]; [ {B, true, 0, 6} ]; [ {D, true, 0, 3} ]; etc.
一个好的分类器会考虑状态值间隔和最近的状态更改,以确定状态更改的组合是否与类别的训练数据紧密匹配。
编辑:睡后关于如何从多个传感器的警报数据中提取特征以及如何将其与之前的数据进行比较的一些想法...
首先计算每个传感器一天中每个小时的以下数据:
- 平均状态间隔长度(对于
true
和false
状态) - 状态更改之间的平均时间
- 随时间变化的状态数量
然后可以将每个传感器与矩阵中的每个其他传感器进行比较数据如下:
- 传感器 B 在传感器 A 变为真实状态后所花费的平均时间。如果平均值为 60 秒,那么 1 秒的等待会比 120 秒的等待更有趣。
- 当传感器 A 处于一种状态时,传感器 B 经历的状态变化的平均次数
给定两组训练数据,分类器应该能够从这些特征集中确定哪一个是最有可能进行分类的类别。
这是一种明智的方法吗?比较这些特征的好算法是什么?
编辑:状态变化的方向(false->true
vs true-false
)很重要,因此任何功能都应该考虑到这一点帐户。
Given a time series of sensor state intervals, how do I implement a classifier which learns from supervised training data to detect an incident based on a sequence of state intervals? To simplify the problem, sensor states are reduced to either true
or false
.
Update: I've found this paper (PDF) on Mining Sequences of Temporal Intervals which addresses a similar problem. Another paper (Google Docs) on Mining Hierarchical Temporal Patterns in Multivariate Time Series takes a novel approach, but deals with hierarchical data.
Example Training Data
The following data is a training example for an incident, represented as a graph over time, where /¯¯¯\
represents a true
state interval and \___/
a false
state interval for a sensor.
Sensor | Sensor State over time
| 0....5....10...15...20...25... // timestamp
---------|--------------------------------
A | ¯¯¯¯¯¯¯¯¯¯¯¯\________/¯¯¯¯¯¯¯¯
B | ¯¯¯¯¯\___________________/¯¯¯¯
C | ______________________________ // no state change
D | /¯\_/¯\_/¯\_/¯\_/¯\_/¯\_/¯\_/¯
E | _________________/¯¯¯¯¯¯¯¯\___
Incident Detection vs Sequence Labeling vs Classification
I initially generalised my problem as a two-category sequence labeling problem, but my categories really represented "normal operation" and a rare "alarm event" so I have rephrased my question as incident detection. Training data is available for "normal operation" and "alarm incident".
To reduce problem complexity, I have discretized sensor events to boolean values, but this need not be the case.
Possible Algorithms
A hidden Markov model seems to be a possible solution, but would it be able to use the state intervals? If a sequence labeler is not the best approach for this problem, alternative suggestions would be appreciated.
Bayesian Probabilistic Approach
Sensor activity will vary significantly by time of day (busy in mornings, quiet at night). My initial approach would have been to measure normal sensor state over a few days and calculate state probability by time of day (hour). The combined probability of sensor states at an unlikely hour surpassing an "unlikelihood threshold" would indicate an incident. But this seemed like it would raise a false alarm if the sensors were noisy. I have not yet implemented this, but I believe that approach has merit.
Feature Extraction
Vector states could be represented as state interval changes occurring at a specific time and lasting a specific duration.
struct StateInterval
{
int sensorID;
bool state;
DateTime timeStamp;
TimeSpan duration;
}
eg. Some State Intervals from the process table:
[ {D, true, 0, 3} ]; [ {D, false, 4, 1} ]; ...
[ {A, true, 0, 12} ]; [ {B, true, 0, 6} ]; [ {D, true, 0, 3} ]; etc.
A good classifier would take into account state-value intervals and recent state changes to determine if a combination of state changes closely matches training data for a category.
Edit: Some ideas after sleeping on how to extract features from multiple sensors' alarm data and how to compare it to previous data...
Start by calculating the following data for each sensor for each hour of the day:
- Average state interval length (for
true
andfalse
states) - Average time between state changes
- Number of state changes over time
Each sensor could then be compared to every other sensor in a matrix with data like the following:
- Average time taken for sensor B to change to a true state after sensor A did. If an average value is 60 seconds, then a 1-second wait would be more interesting than a 120-second wait.
- Average number of state changes sensor B underwent while sensor A was in one state
Given two sets of training data, the classifier should be able to determine from these feature sets which is the most likely category for classification.
Is this a sensible approach and what would be a good algorithm to compare these features?
Edit: the direction of a state change (false->true
vs true-false
) is significant, so any features should take that into account.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
一种简单的解决方案是折叠数据的时间方面,并将每个时间戳作为一个实例。在这种情况下,传感器的值被视为特征向量,其中每个时间步都标有类别 A 或 B 的类别值(至少对于标记的训练数据):
此输入数据被馈送到通常的分类算法(ANN,SVM,...),目标是预测未标记时间序列的类别:
降维/特征提取的中间步骤可以改善结果。
显然,这可能不如对序列的时间动态进行建模那么好,特别是因为隐马尔可夫模型(HMM)等技术考虑了各种状态之间的转换。
编辑
根据您下面的评论,似乎获得目标类的短暂预测的最佳方法是在预测阶段结束时应用后处理规则,并处理分类输出为连续预测序列。
它的工作原理是计算类后验概率(即:实例属于每个类标签的概率分布,在二元 SVM 的情况下很容易从决策函数导出),然后给定指定的阈值,您检查预测类的概率是否高于该阈值:如果是,我们使用该类来预测当前时间戳,如果不是,则我们保留之前的预测,对于未来的实例也是如此。这具有为当前预测添加一定惯性的效果。
A simple solution would be collapse the time aspect of your data and take each timestamp as one instance. In this case, the values of the sensors are considered your feature vector, where each time step is labeled with a class value of category A or B (at least for the labeled training data):
This input data is fed to the usual classification algorithms (ANN, SVM, ...), and the goal is to predict the class of unlabeled time series:
An intermediary step of dimensionality reduction / feature extraction could improve the results.
Obviously this may not be as good as modeling the time dynamics of the sequences, especially since techniques such as Hidden Markov Models (HMM) take into account the transitions between the various states.
EDIT
Based on your comment below, it seems that the best way to get less transitory predictions of the target class is to a apply a post-processing rule at the end of the prediction phase, and treating the classification output as a sequence of consecutive predictions.
The way this works is that you would compute the class posterior probabilities (ie: probability distribution that an instance belong to each class label, which in the case of binary SVM are easily derived from the decision function), then given a specified threshold, you check if the probability of the predicted class is above that threshold: if it is we use that class to predict the current timestamp, if not then we keep the previous prediction, and the same goes for future instances. This has the effect of adding a certain inertia to the current prediction.
这听起来不像是一个分类问题。分类器并不是真正要考虑“状态变化的组合”。这听起来像是一个序列标记问题。研究使用隐马尔可夫模型或条件随机场。您可以在 http://leon.bottou.org/projects/sgd。
编辑:
我已经更详细地阅读了你的问题,并且考虑到你想要对功能做什么,我认为 HMM 不是最好的模型。它会破坏你的状态空间并且可能使推理变得困难。您需要一个更具表现力的模型。您可以查看动态贝叶斯网络。他们通过允许状态空间以因子形式表示来概括 HMM。凯文·墨菲的论文是我见过的最全面的资源。
不过我仍然喜欢 CRF。作为一个简单的起点,用一天中的时间和每个传感器读数定义一个作为每个观察的特征,并使用二元特征函数。您可以从那里看到它的性能并增加功能的复杂性。不过我会从简单开始。我认为你低估了你的一些想法实施的难度。
This doesn't sound like a classification problem. Classifiers aren't really meant to take into account "a combination of state changes." It sounds like a sequence labeling problem. Look into using a Hidden Markov Model or a Conditional Random Field. You can find an efficient implementation of the latter at http://leon.bottou.org/projects/sgd.
Edit:
I've read through your question in a little more detail, and I don't think and HMM is the best model given what you want to do with features. It's going to blow up your state space and could make inference intractable. You need a more expressive model. You could look at Dynamic Bayesian Networks. They generalize HMMs by allowing the state space to be represented in factored form. Kevin Murphy's dissertation is the most thorough resource for them I've come across.
I'll still like CRFs though. Just as an easy place to start, define one with the time of day and each of the sensor readings as the features for each observation and use bigram feature functions. You can see how it performs and increase complexity of your features from there. I would start simple though. I think you're underestimating how difficult some of your ideas will be to implement.
为什么要重新发明轮子?查看 TClass
如果这不适合您,您可以发现那里还有一些指针。我希望这有帮助。
Why reinvent the wheel? Check out TClass
If that doesn't cut it for you, you can find there also a number of pointers. I hope this helps.