罕见事件检测

发布于 2024-09-05 00:03:07 字数 1539 浏览 7 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

养猫人 2024-09-12 00:03:07

进一步描述您的场景可能会有所帮助。由于您试图找到罕见事件,我假设您有一个不罕见的工作定义(对于某些问题空间,这确实很难)。

例如,假设我们有一些进程不是随机游走进程,例如某些服务的 CPU 利用率。如果您想检测罕见事件,您可以获取平均利用率,然后查看几个标准差。 统计过程控制中的技术在这里很有用。

如果我们有一个随机游走过程,例如股票价格(打开蠕虫罐头......为了简单起见,请假设这一点)。从 t 到 t+1 的定向运动是随机的。随机事件可能是单个方向上一定数量的连续移动,或者是单个时间步上单个方向上的大幅移动。请参阅随机微积分了解基本概念。

如果步骤 t 的过程仅依赖于步骤 t-1,那么我们可以使用 马尔可夫链 来对过程进行建模。

这是您可以使用的数学技术的简短列表。现在开始机器学习。为什么要使用机器学习? (总是要考虑一下以确保您不会使问题过于复杂化)让我们假设您这样做并且这是正确的解决方案。在此阶段,您使用的实际算法并不是很重要。您需要做的是定义什么是罕见事件。相反,您可以定义什么是正常事件并查找不正常的事件。请注意,这些不是同一件事。假设我们产生一组罕见事件 r1...rn。每一个罕见事件都会有一些与之相关的特征。例如,如果一台计算机出现故障,可能会有诸如上次在网络上看到它的时间、其交换机端口状态等特征......这实际上是机器学习、训练集构建中最重要的部分。它通常包括手动标记一组示例来训练模型。一旦您更好地了解了特征空间,您就可以训练另一个模型来为您进行标记。重复此过程,直到您满意为止。

现在,如果您能够定义罕见事件集,那么简单地生成启发式方法可能会更便宜。为了检测罕见事件,我一直发现这种方法效果更好。

It may help to describe your scenario more. Since you are trying to find rare events I assume that you have a working definition of not rare (For some problem spaces this is really hard).

For instance lets say that we have some process that is not a random walk process such as CPU utilization for some service. If you wanted to detect rare events you could take the mean utilization and then look several standard deviations out. Techniques from Statistical Process Control are useful here.

If we have a random walk process such as stock prices (can of worms opened...please just assume this for the sake of simplicity). The directional movement from t to t+1 is random. A random event might be a certain number of consecutive moves in a single direction or a large move in a single direction at a single time step. See Stochastic Calculus for the underlying concepts.

If a process at step t is dependent only on step t-1 then we can use Markov Chains to model the process.

This is a short list of mathematical techniques available to you. Now on to machine learning. Why do you want to use machine learning? (Always good to think about to make sure you are not over complicating the problem) Lets assume that you do and it is the right solution. The actual algorithm that you use is not very important at this stage. What you need to do is define what a rare event is. Conversely you can define what a normal event is and look for things that are not normal. Note that these are not the same thing. Say we produce a set of rare events r1...rn. Each of those rare events will have some features associated with it. For instance if a computer failed there might be features like the last time it was seen on a network, its switch port status, etc... This is actually the most important part of machine learning, training set construction. It usually consists of hand labeling a set of examples to train the model on. Once you have a better understanding of the feature space you may be able to train another model to label for you. Repeat this process until you are satisfied.

Now if you are able to define your rare event set it may be cheaper to simply generate heuristics. For detecting rare events I have always found this to work better.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文