将隐马尔可夫模型应用于多个同时比特序列
这篇关于实现隐马尔可夫模型的优秀文章 C# 可以很好地根据训练数据对单个位序列进行分类。
如何修改算法,或构建它(多个 HMM?)以支持多个同时比特序列的分类?
示例
不只对一个流进行分类:
double t1 = hmm.Evaluate(new int[] { 0,1 }); // 0.49999423004045024
double t2 = hmm.Evaluate(new int[] { 0,1,1,1 }); // 0.11458685045803882
而是对双比特流进行分类:
double t1 = hmm.Evaluate(new int[] { [0, 0], [0, 1] });
double t2 = hmm.Evaluate(new int[] { [0, 0], [1, 1], [0, 1], [1, 1] });
或者更好的是对三个流进行分类:
double t1 = hmm.Evaluate(new int[] { [0, 0, 1], [0, 0, 1] });
double t2 = hmm.Evaluate(new int[] { [0, 0, 1], [1, 1, 0], [0, 1, 1], [1, 1, 1] });
显然,训练数据也会扩展。
This excellent article on implementing a Hidden Markov Model in C# does a fair job of classifying a single bit sequence based on training data.
How to modify the algorithm, or build it out (multiple HMMs?) to support the classification of multiple simultaneous bit sequences?
Example
Instead of classifying just one stream:
double t1 = hmm.Evaluate(new int[] { 0,1 }); // 0.49999423004045024
double t2 = hmm.Evaluate(new int[] { 0,1,1,1 }); // 0.11458685045803882
Rather classify a dual bit stream:
double t1 = hmm.Evaluate(new int[] { [0, 0], [0, 1] });
double t2 = hmm.Evaluate(new int[] { [0, 0], [1, 1], [0, 1], [1, 1] });
Or even better, three streams:
double t1 = hmm.Evaluate(new int[] { [0, 0, 1], [0, 0, 1] });
double t2 = hmm.Evaluate(new int[] { [0, 0, 1], [1, 1, 0], [0, 1, 1], [1, 1, 1] });
Obviously the training data would also be expanded.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
技巧是将观察集建模为每个序列的所有可能值的 n 元笛卡尔积,在您的情况下,HMM 将具有
2^n
输出符号,其中n 是位序列的数量。
示例:对于三个比特序列,8个符号是:
000 001 010 011 100 101 110 111
,就好像我们创建了一个巨型变量,其值是所有可能的值元组各个观察序列的数量(每个位序列的0/1
)The trick is to model the set of observations as the n-ary cartesian product of all possible values of each sequence, in your case the HMM will have
2^n
output symbol wheren
is the number of bit sequences.Example: for three bit sequences, the 8 symbols are:
000 001 010 011 100 101 110 111
, as if we created a megavariable whose values are all the possible tuples of values of the individual observation sequences (0/1
of each bit sequence)提到的文章涉及 Accord.NET Framework 中的隐藏马尔可夫模型实现。使用框架的完整版本(而不仅仅是该文章中提供的子项目)时,可以使用通用的 HiddenMarkovModel 模型并使用任何合适的发射符号分布。如果用户想要表达两个或三个离散变量之间的联合概率,则值得使用 JointDistribution 类。
但是,如果有很多符号变量,以致于表达所有可能的变量组合不切实际,则最好对特征使用连续表示并使用 多元正态分布。
一个例子是:
The article mentioned deals with the hidden Markov model implementation in the Accord.NET Framework. When using the complete version of the framework, and not just the subproject available in that article, one can use the generic HiddenMarkovModel model and use any suitable emission symbol distribution. If the user would like to express the joint probability between two or three discrete variables, it would be worth to use the JointDistribution class.
If, however, there are many symbol variables, such that expression all possible variable combinations is not practical, it should be better to use a continuous representation for the features and use a Multivariate Normal distribution instead.
An example would be: