使用马尔可夫链(或类似的东西)来生成 IRC 机器人
我尝试用谷歌搜索,发现几乎没有什么我能理解的。
我对马尔可夫链的理解非常基本:这是一个数学模型,仅依赖于先前的输入改变状态..所以某种有限状态机具有加权随机机会而不是不同的标准?
我听说你可以使用它们来生成半智能的废话,给定现有单词的句子用作各种字典。
我想不出搜索词来找到这个,所以任何人都可以链接我或解释我如何产生一些给出半智能答案的东西吗? (如果你问它关于馅饼的事,它不会开始谈论它听说过的越南战争)
我计划:
- 让这个机器人在 IRC 频道中闲置一段时间从
- 字符串中删除任何用户名并存储为句子或其他内容
- 随着时间的推移,以此作为上述的基础。
I tried google and found little that I could understand.
I understand Markov chains to a very basic level: It's a mathematical model that only depends on previous input to change states..so sort of a FSM with weighted random chances instead of different criteria?
I've heard that you can use them to generate semi-intelligent nonsense, given sentences of existing words to use as a dictionary of kinds.
I can't think of search terms to find this, so can anyone link me or explain how I could produce something that gives a semi-intelligent answer? (if you asked it about pie, it would not start going on about the vietnam war it had heard about)
I plan on:
- Having this bot idle in IRC channels for a bit
- Strip any usernames out of the string and store as sentences or whatever
- Over time, use this as the basis for the above.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
是的,马尔可夫链是具有概率状态转换的有限状态机。要使用简单的一阶马尔可夫链生成随机文本:
如果你想从中得到一些半智能的东西,那么你最好的办法就是用大量精心收集的文本来训练它。 “lots”部分使其以高概率生成正确的句子(或看似合理的 IRC 语言); “精心收集”部分意味着您可以控制它所谈论的内容。引入高阶马尔可夫链在这两个方面也有帮助,但需要更多的存储空间来存储必要的统计数据。您还可以研究诸如统计平滑之类的内容。
然而,让你的 IRC 机器人真正响应所说的内容比马尔可夫链需要更多的时间。可以通过对所说内容进行文本分类(又名主题识别)来完成,然后选择特定于域的马尔可夫链来生成文本。朴素贝叶斯是一种流行的主题发现模型。
Kernighan 和 Pike 在编程实践中探索马尔可夫链算法的各种实现策略。 Jurafsky 和 Martin 深入探讨了这些内容以及一般的自然语言生成,< em>语音和语言处理。
Yes, a Markov chain is a finite-state machine with probabilistic state transitions. To generate random text with a simple, first-order Markov chain:
If you want to get something semi-intelligent out of this, then your best shot is to train it on lots of carefully collected texts. The "lots" part makes it produce proper sentences (or plausible IRC speak) with high probability; the "carefully collected" part means you control what it talks about. Introducing higher-order Markov chains also helps in both areas, but takes more storage to store the necessary statistics. You may also look into things like statistical smoothing.
However, having your IRC bot actually respond to what is said to it takes a lot more than Markov chains. It may be done by doing text categorization (aka topic spotting) on what is said, then picking a domain-specific Markov chain for text generation. Naïve Bayes is a popular model for topic spotting.
Kernighan and Pike in The Practice of Programming explore various implementation strategies for Markov chain algorithms. These, and natural language generation in general, is covered in great depth by Jurafsky and Martin, Speech and Language Processing.
您想要寻找 Ian Barber 文本生成 ( phpir.com )。不幸的是,该网站已关闭或离线。我有他的文本副本,我想将其发送给您。
You want to look for Ian Barber Text Generation ( phpir.com ). Unfortunately the site is down or offline. I have a copy of his text and I want to send it to you.
在我看来,你正在同时尝试多种事情:
这些基本上是非常不同的任务。马尔可夫模型通常用于机器学习。不过,我在你的任务中没有看到太多的学习内容。
拉斯曼的答案展示了如何从基于单词的马尔可夫模型生成句子。您还可以训练权重以支持其他 IRC 用户使用的单词对。但尽管如此,这不会生成与关键字相关的句子,因为构建/完善马尔可夫模型与“驱动”它不同。
您可以尝试隐马尔可夫模型(HMM),其中可见输出是关键字,隐藏状态是由这些单词对组成的。然后,您可以动态地选择更适合特定关键字的句子。
It seems to me you are trying multiple things at the same time:
Those are basically very different tasks. Markov models are often used for machine learning. I don't see much learning in your tasks though.
larsmans answer shows how you generate sentences from word-based markov-models. You can also train the weights to favor those word-pairs that other IRC users used. But nonetheless this will not generate keyword-related sentences, because building/refining a markov model is not the same as "driving" it.
You might try hidden markov models (HMM) where the visible output is the keywords and the hidden states are made from those word-pairs. You could then favor sentences more appropriate to specific keywords dynamically.