用通俗易懂的语言解释马尔可夫链算法

发布于 2024-09-30 12:03:19 字数 1997 浏览 4 评论 0原文

我不太明白这个马尔可夫......它需要两个单词作为前缀和后缀保存它们的列表并生成随机单词？

    /* Copyright (C) 1999 Lucent Technologies */
/* Excerpted from 'The Practice of Programming' */
/* by Brian W. Kernighan and Rob Pike */

#include <time.h>
#include <iostream>
#include <string>
#include <deque>
#include <map>
#include <vector>

using namespace std;

const int  NPREF = 2;
const char NONWORD[] = "\n";    // cannot appear as real line: we remove newlines
const int  MAXGEN = 10000; // maximum words generated

typedef deque<string> Prefix;

map<Prefix, vector<string> > statetab; // prefix -> suffixes

void        build(Prefix&, istream&);
void        generate(int nwords);
void        add(Prefix&, const string&);

// markov main: markov-chain random text generation
int main(void)
{
    int nwords = MAXGEN;
    Prefix prefix;  // current input prefix

    srand(time(NULL));
    for (int i = 0; i < NPREF; i++)
        add(prefix, NONWORD);
    build(prefix, cin);
    add(prefix, NONWORD);
    generate(nwords);
    return 0;
}

// build: read input words, build state table
void build(Prefix& prefix, istream& in)
{
    string buf;

    while (in >> buf)
        add(prefix, buf);
}

// add: add word to suffix deque, update prefix
void add(Prefix& prefix, const string& s)
{
    if (prefix.size() == NPREF) {
        statetab[prefix].push_back(s);
        prefix.pop_front();
    }
    prefix.push_back(s);
}

// generate: produce output, one word per line
void generate(int nwords)
{
    Prefix prefix;
    int i;

    for (i = 0; i < NPREF; i++)
        add(prefix, NONWORD);
    for (i = 0; i < nwords; i++) {
        vector<string>& suf = statetab[prefix];
        const string& w = suf[rand() % suf.size()];
        if (w == NONWORD)
            break;
        cout << w << "\n";
        prefix.pop_front(); // advance
        prefix.push_back(w);
    }
}

原文

I don't quite understand this Markov... it takes two words a prefix and suffix saves up a list of them and makes random word?

    /* Copyright (C) 1999 Lucent Technologies */
/* Excerpted from 'The Practice of Programming' */
/* by Brian W. Kernighan and Rob Pike */

#include <time.h>
#include <iostream>
#include <string>
#include <deque>
#include <map>
#include <vector>

using namespace std;

const int  NPREF = 2;
const char NONWORD[] = "\n";    // cannot appear as real line: we remove newlines
const int  MAXGEN = 10000; // maximum words generated

typedef deque<string> Prefix;

map<Prefix, vector<string> > statetab; // prefix -> suffixes

void        build(Prefix&, istream&);
void        generate(int nwords);
void        add(Prefix&, const string&);

// markov main: markov-chain random text generation
int main(void)
{
    int nwords = MAXGEN;
    Prefix prefix;  // current input prefix

    srand(time(NULL));
    for (int i = 0; i < NPREF; i++)
        add(prefix, NONWORD);
    build(prefix, cin);
    add(prefix, NONWORD);
    generate(nwords);
    return 0;
}

// build: read input words, build state table
void build(Prefix& prefix, istream& in)
{
    string buf;

    while (in >> buf)
        add(prefix, buf);
}

// add: add word to suffix deque, update prefix
void add(Prefix& prefix, const string& s)
{
    if (prefix.size() == NPREF) {
        statetab[prefix].push_back(s);
        prefix.pop_front();
    }
    prefix.push_back(s);
}

// generate: produce output, one word per line
void generate(int nwords)
{
    Prefix prefix;
    int i;

    for (i = 0; i < NPREF; i++)
        add(prefix, NONWORD);
    for (i = 0; i < nwords; i++) {
        vector<string>& suf = statetab[prefix];
        const string& w = suf[rand() % suf.size()];
        if (w == NONWORD)
            break;
        cout << w << "\n";
        prefix.pop_front(); // advance
        prefix.push_back(w);
    }
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

沫雨熙 2024-10-07 12:03:19

根据维基百科，马尔可夫链是一个随机过程，其中下一个状态依赖于前一个状态。这有点难以理解，所以我会尝试更好地解释它：

您所看到的似乎是一个生成基于文本的马尔可夫链的程序。本质上，其算法如下：

将文本正文拆分为标记（单词、标点符号）。
建立频率表。这是一种数据结构，其中文本正文中的每个单词都有一个条目（键）。该键映射到另一个数据结构，该数据结构基本上是该单词（键）后面的所有单词及其频率的列表。
生成马尔可夫链。为此，您选择一个起点（频率表中的一个键），然后随机选择要进入的另一个状态（下一个单词）。您选择的下一个单词取决于其频率（因此某些单词比其他单词更有可能出现）。之后，您使用这个新单词作为关键并重新开始。

例如，如果您查看该解决方案的第一句话，您可以得出以下频率表：

According: to(100%)
to:        Wikipedia(100%)
Wikipedia: ,(100%)
a:         Markov(50%), random(50%)
Markov:    Chain(100%)
Chain:     is(100%)
is:        a(33%), dependent(33%), ...(33%)
random:    process(100%)
process:   with(100%)
.
.
.
better:    :(100%)

本质上，从一种状态到另一种状态的状态转换是基于概率的。在基于文本的马尔可夫链的情况下，转移概率基于所选单词后面的单词的频率。因此，所选单词代表先前的状态，频率表或单词代表（可能的）连续状态。如果您知道先前的状态，您就可以找到连续的状态（这是获得正确频率表的唯一方法），因此这符合连续状态依赖于先前状态的定义。

无耻插件 - 不久前，我用 Perl 编写了一个程序来执行此操作。您可以在此处阅读相关内容。

According to Wikipedia, a Markov Chain is a random process where the next state is dependent on the previous state. This is a little difficult to understand, so I'll try to explain it better:

What you're looking at, seems to be a program that generates a text-based Markov Chain. Essentially the algorithm for that is as follows:

Split a body of text into tokens (words, punctuation).
Build a frequency table. This is a data structure where for every word in your body of text, you have an entry (key). This key is mapped to another data structure that is basically a list of all the words that follow this word (the key) along with its frequency.
Generate the Markov Chain. To do this, you select a starting point (a key from your frequency table) and then you randomly select another state to go to (the next word). The next word you choose, is dependent on its frequency (so some words are more probable than others). After that, you use this new word as the key and start over.

For example, if you look at the very first sentence of this solution, you can come up with the following frequency table:

According: to(100%)
to:        Wikipedia(100%)
Wikipedia: ,(100%)
a:         Markov(50%), random(50%)
Markov:    Chain(100%)
Chain:     is(100%)
is:        a(33%), dependent(33%), ...(33%)
random:    process(100%)
process:   with(100%)
.
.
.
better:    :(100%)

Essentially, the state transition from one state to another is probability based. In the case of a text-based Markov Chain, the transition probability is based on the frequency of words following the selected word. So the selected word represents the previous state and the frequency table or words represents the (possible) successive states. You find the successive state if you know the previous state (that's the only way you get the right frequency table), so this fits in with the definition where the successive state is dependent on the previous state.

Shameless Plug - I wrote a program to do just this in Perl, some time ago. You can read about it here.

回复收藏 0 原文