用于预测事件顺序的机器学习算法?

发布于 2024-08-26 16:23:53 字数 852 浏览 7 评论 0原文

简单的机器学习问题。可能有很多方法可以解决这个问题:

有一个包含 4 个可能事件的无限流:

'event_1'、'event_2'、'event_4'、'event_4'

这些事件不以完全随机的顺序出现。我们假设大多数事件发生的顺序存在一些复杂的模式,而其余事件只是随机的。但我们并不提前知道这些模式。

收到每个事件后,我想根据过去事件发生的顺序来预测下一个事件将是什么。所以我的问题是:我应该为这个预测器使用什么机器学习算法?

然后预测器将被告知下一个事件实际上是什么:

Predictor=new_predictor()

prev_event=False
while True:
    event=get_event()
    if prev_event is not False:
        Predictor.last_event_was(prev_event)
    predicted_event=Predictor.predict_next_event(event)

问题是预测器应该维持多长时间的历史记录,因为维持无限的历史是不可能的。我将这个问题留给你来回答。但出于实用性,答案不能是无限的。

所以我相信预测必须以某种滚动的历史来完成。因此,添加新事件和使旧事件过期应该相当高效,并且不需要重建整个预测器模型。

对我来说,具体的代码(而不是研究论文)将为您的回复添加巨大的价值。 Python 或 C 库都不错,但任何东西都可以。

更新:如果每一轮可以同时发生多个事件怎么办?这会改变解决方案吗?

Simple machine learning question. Probably numerous ways to solve this:

There is an infinite stream of 4 possible events:

'event_1', 'event_2', 'event_4', 'event_4'

The events do not come in in completely random order. We will assume that there are some complex patterns to the order that most events come in, and the rest of the events are just random. We do not know the patterns ahead of time though.

After each event is received, I want to predict what the next event will be based on the order that events have come in in the past. So my question is: What machine learning algorithm should I use for this predictor?

The predictor will then be told what the next event actually was:

Predictor=new_predictor()

prev_event=False
while True:
    event=get_event()
    if prev_event is not False:
        Predictor.last_event_was(prev_event)
    predicted_event=Predictor.predict_next_event(event)

The question arises of how long of a history that the predictor should maintain, since maintaining infinite history will not be possible. I'll leave this up to you to answer. The answer can't be infinte though for practicality.

So I believe that the predictions will have to be done with some kind of rolling history. Adding a new event and expiring an old event should therefore be rather efficient, and not require rebuilding the entire predictor model, for example.

Specific code, instead of research papers, would add for me immense value to your responses. Python or C libraries are nice, but anything will do.

Update: And what if more than one event can happen simultaneously on each round. Does that change the solution?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

司马昭之心 2024-09-02 16:23:53

这本质上是一个序列预测问题,因此您需要循环神经网络或隐马尔可夫模型。

如果你只有固定的时间回顾,时间窗口逼近可能就足够了。您获取序列数据并将其分割为长度为 n 的重叠窗口。 (例如,您将序列 ABCDEFG 拆分为 ABC、BCD、CDE、DEF、EFG)。然后训练函数逼近器(例如神经网络或线性回归)以将该窗口的前 n-1 部分映射到第 n 部分。

你的预测者将无法回顾比你的窗口大小更长的时间。 RNN 和 HMM 理论上可以做到这一点,但很难调整或者有时根本不起作用。

(最先进的 RNN 实现可以在 PyBrain http://pybrain.org 中找到)

更新:这是 pybrain 代码对于你的问题。 (我还没有测试过,可能有一些拼写错误之类的东西,但整体结构应该可以工作。)

from pybrain.datasets import SequentialDataSet
from pybrain.supervised.trainers import BackpropTrainer
from pybrain.tools.shortcuts import buildNetwork
from pybrain.structure import SigmoidLayer

INPUTS = 4
HIDDEN = 10
OUTPUTS = 4

net = buildNetwork(INPUTS, HIDDEN, OUTPUTS, hiddenclass=LSTMLayer, outclass=SigmoidLayer, recurrent=True)

ds = SequentialDataSet(INPUTS, OUTPUTS)

# your_sequences is a list of lists of tuples which each are a bitmask
# indicating the event (so 1.0 at position i if event i happens, 0.0 otherwise)

for sequence in your_sequences:
    for (inpt, target) in zip(sequence, sequence[1:]):
        ds.newSequence()
        ds.appendLinked(inpt, target)

net.randomize()

trainer = BackpropTrainer(net, ds, learningrate=0.05, momentum=0.99)
for _ in range(1000):
    print trainer.train()

这将训练循环网络 1000 个 epoch,并在每个 epoch 后打印出错误。然后,您可以像这样检查正确的预测:

net.reset()
for i in sequence:
  next_item = net.activate(i) > 0.5
  print next_item

这将为每个事件打印一个布尔值数组。

This is essentially a sequence prediction problem, so you want Recurrent neural networks or hidden Markov models.

If you only have a fixed time to look back, time window approaches might suffice. You take the sequence data and split it into overlapping windows of length n. (eg. you split a sequence ABCDEFG into ABC, BCD, CDE, DEF, EFG). Then you train a function approximator (e.g. neural network or linear regression) to map the first n-1 parts of that window onto the nth part.

Your predictor will not be able to look back in time longer than the size of your window. RNNs and HMMs can do so in theory, but are hard to tune or sometimes just don't work.

(State of the art RNN implementations can be found in PyBrain http://pybrain.org)

Update: Here is the pybrain code for your problem. (I haven't tested it, there might be some typos and stuff, but the overall structure should work.)

from pybrain.datasets import SequentialDataSet
from pybrain.supervised.trainers import BackpropTrainer
from pybrain.tools.shortcuts import buildNetwork
from pybrain.structure import SigmoidLayer

INPUTS = 4
HIDDEN = 10
OUTPUTS = 4

net = buildNetwork(INPUTS, HIDDEN, OUTPUTS, hiddenclass=LSTMLayer, outclass=SigmoidLayer, recurrent=True)

ds = SequentialDataSet(INPUTS, OUTPUTS)

# your_sequences is a list of lists of tuples which each are a bitmask
# indicating the event (so 1.0 at position i if event i happens, 0.0 otherwise)

for sequence in your_sequences:
    for (inpt, target) in zip(sequence, sequence[1:]):
        ds.newSequence()
        ds.appendLinked(inpt, target)

net.randomize()

trainer = BackpropTrainer(net, ds, learningrate=0.05, momentum=0.99)
for _ in range(1000):
    print trainer.train()

This will train the recurrent network for 1000 epochs and print out the error after every epochs. Afterwards you can check for correct predictions like this:

net.reset()
for i in sequence:
  next_item = net.activate(i) > 0.5
  print next_item

This will print an array of booleans for every event.

Saygoodbye 2024-09-02 16:23:53

我们可以保留有关过去的聚合信息,而不是保留完整的历史记录(以及相对较短的滑动历史记录,用作预测器逻辑的输入)。

暂定的实施可以是这样的:
简而言之:管理一组递增顺序的马尔可夫链,并对它们的预测进行评分平均

  • 保持单个事件计数表,目的是计算 4 个不同事件中任何一个的概率,而不考虑任何顺序。
  • 保留一个二元组计数表,即[到目前为止]观察到的事件的累积计数
    表开始为空,在观察第二个事件时,我们可以存储第一个二元组,计数为 1。在第三个事件时,由第二个和第三个事件组成的二元组被“添加”到表中:或者增加现有的二元组或添加原始计数 1,作为新的(迄今为止从未见过的)二元组。等等
    同时,在表中保留二元组的总数。
    该表和总计数允许根据前一个事件计算给定事件的概率。
  • 以类似的方式保留一个三元组计数表,以及所看到的总三元组的运行计数(请注意,这将等于二元组的数量减一,因为第一个三元组是在第一个二元组之后的一个事件中添加的,并且在每个新事件都会添加其中一个)。该三元组表允许根据前面的两个事件计算给定事件的概率。
  • 同样,保留 N 元语法表,最多为 10 元语法(算法会告诉我们是否需要增加或减少此值)。
  • 保持最近 10 个事件的滑动窗口。
  • 上表为预测提供了依据;总体思路是:
    • 使用一个公式,将下一个事件的概率表示为基于不同 N 元语法的各个概率的加权平均值。
    • 通过增加公式中相应的权重来奖励更好的个体N-gram长度;以相反的方式惩罚最差的长度。 (请注意,需要考虑单个事件的边际概率,以免我们偏爱恰好预测最频繁事件的 N 元语法,无论与它们相关的预测值相对较差)
    • 系统“看到”足够多的事件后,请查看与长 N-Gram 相关的权重的当前值,如果这些值相对较高,请考虑添加表格来保留有关较大 N-Gram 的聚合信息。 (不幸的是,这在空间和时间方面都损害了算法)

上述一般逻辑可能有几种变化。特别是在选择用于对各个 N 元长度的预测质量进行“分级”的特定度量时。
关于检测和适应事件分布中可能的变化还应考虑其他因素(以上假设一般遍历事件源)。一种可能的方法是使用两组表(相应地组合概率),并定期删除其中一组的所有表的内容。为这些重置选择正确的周期是一件棘手的事情,本质上是平衡对统计上显着的历史量的需求和足够短的周期的需求,以免我错过较短的调制......

Rather than keeping a full history, one can keep aggregated information about the past (along with a relatively short sliding history, to be used as input to the Predictor logic).

A tentative implementation could go like this:
In a nutshell: Managing a set of Markov chains of increasing order, and grading and averaging their predictions

  • keep a table of individual event counts, the purpose is to calculate the probability of any of the 4 different events, without regards to any sequence.
  • keep a table of bigram counts, i.e. a cumulative count the events observed [so far]
    Table starts empty, upon the second event observe, we can store the first bigram, with a count of 1. upond the third event, the bigram made of the 2nd and 3rd events is "added" to the table: either incrementing the count of an existing bigram or added with original count 1, as a new (never-seen-so-far) bigram. etc.
    In parallel, keep a total count of bigrams in the table.
    This table and the total tally allow calculating the probability of a given event, based on the one preceding event.
  • In a similar fashion keep a table of trigram counts, and a running tally of total trigram seen (note that this would be equal to the number of bigrams, minus one, since the first trigram is added one event after the first bigram, and after that one of each is added with each new event). This trigram table allows calculating the probability of a given event based on the two preceding events.
  • likewise, keep tables for N-Grams, up to, say, 10-grams (the algorithm will tell if we need to increase or decrease this).
  • keep an sliding windows into the last 10 events.
  • The above tables provide the basis for prediction; the general idea are to:
    • use a formula which expresses the probabilities of the next event as a weighed average of the individual probabilities based on the different N-grams.
    • reward the better individual N-gram length by increasing the corresponding weight in the formula; punish the worse lengths in the reverse fashion. (Beware the marginal probability of individual events needs to be taken into account lest we favor N-grams which happen to predict the most frequent events, regardless of the relative poor predicting value associated with them them)
    • Once the system has "seen" enough events, see the current values for the weights associated with the long N-Grams, and if these are relatively high, consider adding tables to keep aggregate info about bigger N-Grams. (This unfortunately hurts the algorightm both in terms of space and time)

There can be several variations on the general logic described above. In particular in the choice of the particular metric used to "grade" the quality of prediction of the individual N-Gram lengths.
Other considerations should be put with regards to detecting and adapting to possible shifts in the events distribution (the above assumes a generally ergodic event source). One possible approach is to use two sets of tables (combining the probabilities accordingly), and periodically dropping the contents of all tables of one of the sets. Choosing the right period for these resets is a tricky business, essentially balancing the need for statistically significant volumes of history and the need for short enough period lest me miss on the shorter modulations...

っ左 2024-09-02 16:23:53

问题是预测器应该维持多长时间的历史记录

唯一的答案是“这取决于”。

这取决于这需要有多准确。我不相信这个策略能够 100% 准确,即使有无限的历史。尝试 10 的历史记录,您将获得 x% 的准确度,然后尝试 100,您将获得 y% 的准确度,等等……

最终您应该发现系统要么像您希望的那样准确,要么您会发现准确性的提高不值得历史长度的增加(以及内存使用量、处理时间等的增加......)。此时要么工作完成,要么你需要找到新的策略。

就其价值而言,我认为研究一个简单的“软”神经网络可能是一个更好的计划。

The question arises of how long of a history that the predictor should maintain

The only answer is "it depends".

It depends on how accurate this needs to be. I don't believe this strategy could ever be 100% accurate even with an infinite history. Try out a history of 10 and you'll get x% accuracy, then try 100 and you'll get y% accuracy, etc etc...

Eventually you should find either the system is as accurate as you desire it to be or you will find the rise in accuracy will not be worth the increase in history length (and increased memory usage, processing time etc...). At this point either job done, or you need to find a new strategy.

For what it is worth i think looking into a simple "soft" neural net might be a better plan.

宁愿没拥抱 2024-09-02 16:23:53

我们刚刚研究了计算机体系结构中的分支预测器(因为处理器实际需要很长时间才能评估条件 if(EXPRESSION),它会尝试“猜测”并以这种方式节省一些时间)。我确信在这个领域已经做了更多的研究,但目前我能想到的就这些了。

我还没有见过像你这样的独特设置,所以我认为你可能需要自己做一些初步的实验。尝试运行您的解决方案 X 秒,具有 N 个槽的历史记录,正确率是多少?并将其与相同的固定 X 和不同的 N 历史槽进行比较,以尝试找到最佳的内存历史比率(将它们绘制出来)。

如果多个事件可以同时发生......这有点令人费解,那里必须有一些限制:如果一次发生无限多个事件怎么办?呃,这对你来说在计算上是不可能的。我会尝试使用与一次仅预测一个事件相同的方法,除非启用预测器一次预测多个事件。

We just studied about branch-predictors in computer architecture (Because the processor would take too long to actually evaluate a condition if(EXPRESSION), it tries to 'guess' and save some time that way). I am sure more research has been done in this area, but that's all I can think of at the moment.

I haven't seen a unique setup like yours, so I think you might need to do some preliminary experimentation on your own. Try running your solution for X number of seconds with a history of N slots, what is the correctness ratio? And compare that with the same fixed X and varying N history slots to try to find the best memory-history ratio (graphing them out ).

If more than one event can happen simulataneously... that's a little mind bending, there has to be some constraints there : what if infinite number of events happen at a time? Uhoh, that's computationally impossible for you. I'd try the same approach as just one event at a time, except where the predictor is enabled predict multiple events at a time.

厌倦 2024-09-02 16:23:53

处理器使用一些非常轻量级的技巧来预测分支语句是否会分支。这有助于他们实现高效的管道衬里。例如,它们可能不像马尔可夫模型那么通用,但由于其简单性而很有趣。 这是有关分支预测的 Wikipedia 文章。请参阅饱和计数器两级自适应预测器

Processors use a few really lightweight tricks to predict whether a branch statement will branch or not. This helps them with efficient pipe-lining. They may not be as general as Markov models for instance, but they are interesting because of their simplicity. Here is the Wikipedia article on branch prediction. See the Saturating Counter, and the Two-Level Adaptive Predictor

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文