概率转移矩阵
我正在研究马尔可夫链,我想知道在给定文本文件作为输入的情况下构造概率转移矩阵(n 阶)的有效算法。
我并不追求一种算法,但我宁愿建立一个此类算法的列表。关于此类算法的论文以及术语提示等也非常受欢迎。请注意,该主题与 n 元语法识别算法非常相似。
任何帮助将不胜感激。
I'm working on Markov Chains and I would like to know of efficient algorithms for constructing probabilistic transition matrices (of order n), given a text file as input.
I am not after one algorithm, but I'd rather like to build a list of such algorithms. Papers on such algorithms are also more than welcome, as any tips on terminology, etc. Notice that this topic bears a strong resemblance with n-gram identification algorithms.
Any help would be much appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
听起来有两个可能的问题,您应该澄清哪一个:
“文本文件”包含概率值和“n”,您直接构建矩阵,但如何编码呢?这个问题很简单,所以我们忽略它
“文本文件”包含类似信号数据的内容,您想将其建模为马尔可夫链。
“马尔可夫链”通常指一阶随机过程,所以我不确定你所说的“阶”是什么意思,可能是矩阵的大小,但这不是典型的术语。无论如何,对于一阶、nxn 矩阵、离散时间随机过程,您应该查看维特比算法: http: //en.wikipedia.org/wiki/Viterbi_algorithm
It sounds like there are two possible questions, you should clarify which one:
The 'text file' contains probability values and "n" and you build the matrix directly, but how to code it? This question is trivial, so let's disregard it
The 'text file' contains something like signal data and you want to model it as a Markov Chain.
'Markov Chain' generally refers to a first order stochastic process, so I'm not sure then what you mean by "order", probably the size of the matrix, but that is not typical terminology. Anyway, for 1st-order, n x n matrix, discrete time random process, you should look at Viterbi Algorithm: http://en.wikipedia.org/wiki/Viterbi_algorithm
每当处理马尔可夫模型时,我倾向于最终查看 crm114 判别器。第一,他详细介绍了实际存在的不同模型(马尔可夫并不总是最好的,具体取决于应用程序),并提供了有关概率模型如何工作的一般链接和大量背景信息。虽然 crm114 通常用作某种垃圾邮件识别工具,但它实际上是我在其他应用程序中使用过的更通用的概率引擎。
Whenever dealing with Markov Models, I tend to end up looking at crm114 Discriminator. One, he goes into great detail about what different models there actually are (Markov isn't always the best, depending on what the application is) and provides general links and lots of background information on how probabilistic models work. While crm114 is generally used as some sort of SPAM identification tool, it is actually a more generic probability engine that I have used in other applications.