马尔可夫链聊天机器人如何工作?

发布于 2024-10-21 23:39:22 字数 193 浏览 1 评论 0原文

我正在考虑使用马尔可夫链之类的东西创建一个聊天机器人,但我不完全确定如何让它工作。据我了解,您可以根据给定单词和后面的单词的数据创建一个表格。在训练机器人时是否可以附加任何类型的概率或计数器?这是个好主意吗?

问题的第二部分是关键字。假设我已经可以从用户输入中识别关键字,如何生成使用该关键字的句子?我并不总是想以关键字开始句子,那么如何播种马尔可夫链呢?

I was thinking of creating a chatbot using something like markov chains, but I'm not entirely sure how to get it to work. From what I understand, you create a table from data with a given word and then words which follow. Is it possible to attach any sort of probability or counter while training the bot? Is that even a good idea?

The second part of the problem is with keywords. Assuming I can already identify keywords from user input, how do I generate a sentence which uses that keyword? I don't always want to start the sentence with the keyword, so how do I seed the markov chain?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

柒夜笙歌凉 2024-10-28 23:39:22

几年前,我用 Python 为 IRC 制作了一个马尔可夫链聊天机器人,可以透露一些我是如何做到的。生成的文本不一定有意义,但读起来确实很有趣。让我们分步骤分解它。假设你有一个固定的输入,一个文本文件,(你可以使用聊天文本或歌词的输入,或者只是发挥你的想象力)

循环文本并制作一个字典,意思是键-值容器。并将所有单词对作为键,将后面的单词作为值。
例如:如果您有一个文本“abcab k”,则以“a b”作为键,“c”作为值,然后“bc”和“a”作为值......该值应该是一个列表或任何包含的集合0..许多“项目”,因为给定的一对单词可以有多个值。在上面的示例中,您将出现两次“a b”,然后是“c”,最后是“k”。所以最后你将得到一个如下所示的字典/哈希: {'a b': ['c','k'], 'b c': ['a'], 'c a': ['b']}

现在您已经拥有了构建时髦文本所需的结构。您可以选择从随机密钥或固定位置开始!因此,给定我们拥有的结构,我们可以首先保存“a b”,然后从值 c 或 k 中随机取出后续单词,因此循环中的第一个保存是“ab k”(如果“k”是选择的随机值) )然后你继续向右移动一步,在我们的例子中是“b k”,并为该对保存一个随机值(如果你有,在我们的例子中没有),这样你就可以跳出循环(或者你可以决定其他的东西)就像重新开始一样)。循环完成后,您将打印保存的文本字符串。

输入越大,您的键(单词对)的值就越多,然后就会有一个“更智能的机器人”,这样您就可以通过添加更多文本(也许是聊天输入?)来“训练”您的机器人。如果你有一本书作为输入,你可以构造一些漂亮的随机句子。请注意,您不必仅采用一对后面的一个单词作为值,您可以采用 2 个或 10 个单词。不同之处在于,如果您使用“更长”的构建块,您的文本会显得更准确。以一对作为键,后面的单词作为值开始。

所以你会发现你基本上可以有两个步骤,首先创建一个结构,你随机选择一个键开始,然后获取该键并打印该键的随机值,然后继续,直到没有值或其他条件为止。如果您愿意,您可以从键值结构的聊天输入中“播种”一对单词来开始。如何开始你的连锁店取决于你的想象力。

以真实单词为例:

"hi my name is Al and i live in a box that i like very much and i can live in there as long as i want"

"hi my" -> ["name"]

"my name" -> ["is"]

"name is" -> ["Al"]

"is Al" -> ["and"]

........

"and i" -> ["live", "can"]

........

"i can" -> ["live"]

......

现在构建一个循环:

选择一个随机键,说“嗨我的”并随机选择一个值,这里只有一个值,所以它是“名称”
(保存“嗨,我的名字”)
现在向右移动一步,将“我的名字”作为下一个键,然后选择一个随机值...“是”
(保存“嗨,我的名字是”)
现在移动并取“名字是”...“Al”
(保存“嗨,我的名字是 AL”)
现在采取“是阿尔”...“和”
(保存“嗨,我的名字是阿尔和”)

...

当您来到“和我”时,您将随机选择一个值,比如说“可以”,然后会出现“我可以”这个词等等...当您达到停止条件或没有值时在我们的例子中打印构造的字符串:

“嗨,我的名字是 Al,只要我愿意,我就可以住在那里”

如果您有更多值,您可以跳转到任何键。值越多,您拥有的组合就越多,文本就越随机、越有趣。

I made a Markov chain chatbot for IRC in Python a few years back and can shed some light how I did it. The text generated does not necessarily make any sense, but it can be really fun to read. Lets break it down in steps. Assuming you have a fixed input, a text file, (you can use input from chat text or lyrics or just use your imagination)

Loop through the text and make a Dictionary, meaning key - value container. And put all pair of words as keys and the word following as a value.
For example: If you have a text "a b c a b k" you start with "a b" as key and "c" as value, then "b c" and "a" as value... the value should be a list or any collection holding 0..many 'items' as you can have more than one value for a given pair of words. In the example above you will have "a b" two times followed fist by "c" then in the end by "k". So in the end you will have a dictionary/hash looking like this: {'a b': ['c','k'], 'b c': ['a'], 'c a': ['b']}

Now you have the needed structure for building your funky text. You can choose to start with a random key or a fixed place! So given the structure we have we can start by saving "a b" then randomly taking a following word from the value, c or k, so the first save in the loop, "a b k" (if "k" was the random value chosen) then you continue by moving one step to the right which in our case is "b k" and save a random value for that pair if you have, in our case no, so you break out of the loop (or you can decide other stuff like start over again). When to loop is done you print your saved text string.

The bigger the input, the more values you will have for you keys (pair of words) and will then have a "smarter bot" so you can "train" your bot by adding more text (perhaps chat input?). If you have a book as input, you can construct some nice random sentences. Please note that you don't have to take only one word that follows a pair as a value, you can take 2 or 10. The difference is that your text will appear more accurate if you use "longer" building blocks. Start with a pair as a key and the following word as a value.

So you see that you basically can have two steps, first make a structure where you randomly choose a key to start with then take that key and print a random value of that key and continue till you do not have a value or some other condition. If you want you can "seed" a pair of words from a chat input from your key-value structure to have a start. Its up to your imagination how to start your chain.

Example with real words:

"hi my name is Al and i live in a box that i like very much and i can live in there as long as i want"

"hi my" -> ["name"]

"my name" -> ["is"]

"name is" -> ["Al"]

"is Al" -> ["and"]

........

"and i" -> ["live", "can"]

........

"i can" -> ["live"]

......

Now construct a loop:

Pick a random key, say "hi my" and randomly choose a value, only one here so its "name"
(SAVING "hi my name").
Now move one step to the right taking "my name" as the next key and pick a random value... "is"
(SAVING "hi my name is").
Now move and take "name is" ... "Al"
(SAVING "hi my name is AL").
Now take "is Al" ... "and"
(SAVING "hi my name is Al and").

...

When you come to "and i" you will randomly choose a value, lets say "can", then the word "i can" is made etc... when you come to your stop condition or that you have no values print the constructed string in our case:

"hi my name is Al and i can live in there as long as i want"

If you have more values you can jump to any keys. The more values the more combinations you have and the more random and fun the text will be.

听,心雨的声音 2024-10-28 23:39:22

机器人从您的输入中选择一个随机单词,并通过选择另一个被视为其保留单词的后继的随机单词来生成响应。然后它会重复这个过程,依次找到该单词的后继者并迭代进行,直到它认为已经说得足够多了。它通过在训练文本中标点符号之前的单词处停止来得出该结论。然后它再次返回输入模式以让您做出响应,依此类推。

这不太现实,但我在此挑战任何人在 71 行代码中做得更好!对于任何初露头角的 Python 专家来说,这都是一个巨大的挑战,我只是希望能够向更广泛的受众开放这一挑战,而不是访问此博客的少数访问者。为了编写一个始终保证符合语法的机器人,肯定必须接近数百行,我通过尝试思考最简单的规则来极大地简化,让计算机能够尝试说些什么。

至少可以说,它的反应相当印象主义!另外,您必须将您所说的内容放在单引号中。

我使用《战争与和平》作为我的“语料库”,训练运行需要几个小时,如果你不耐烦,请使用较短的文件……

这是训练器

#lukebot-trainer.py
import pickle
b=open('war&peace.txt')
text=[]
for line in b:
    for word in line.split():
        text.append (word)
b.close()
textset=list(set(text))
follow={}
for l in range(len(textset)):
    working=[]
    check=textset[l]
    for w in range(len(text)-1):
        if check==text[w] and text[w][-1] not in '(),.?!':
            working.append(str(text[w+1]))
    follow[check]=working
a=open('lexicon-luke','wb')
pickle.dump(follow,a,2)
a.close()

这是机器人:

#lukebot.py
import pickle,random
a=open('lexicon-luke','rb')
successorlist=pickle.load(a)
a.close()
def nextword(a):
    if a in successorlist:
        return random.choice(successorlist[a])
    else:
        return 'the'
speech=''
while speech!='quit':
    speech=raw_input('>')
    s=random.choice(speech.split())
    response=''
    while True:
        neword=nextword(s)
        response+=' '+neword
        s=neword
        if neword[-1] in ',?!.':
            break
    print response

当它说一些东西时,你往往会有一种不可思议的感觉这似乎有部分道理。

The bot chooses a random word from your input and generates a response by choosing another random word that has been seen to be a successor to its held word. It then repeats the process by finding a successor to that word in turn and carrying on iteratively until it thinks it’s said enough. It reaches that conclusion by stopping at a word that was prior to a punctuation mark in the training text. It then returns to input mode again to let you respond, and so on.

It isn’t very realistic but I hereby challenge anyone to do better in 71 lines of code !! This is a great challenge for any budding Pythonists, and I just wish I could open the challenge to a wider audience than the small number of visitors I get to this blog. To code a bot that is always guaranteed to be grammatical must surely be closer to several hundred lines, I simplified hugely by just trying to think of the simplest rule to give the computer a mere stab at having something to say.

Its responses are rather impressionistic to say the least ! Also you have to put what you say in single quotes.

I used War and Peace for my “corpus” which took a couple of hours for the training run, use a shorter file if you are impatient…

here is the trainer

#lukebot-trainer.py
import pickle
b=open('war&peace.txt')
text=[]
for line in b:
    for word in line.split():
        text.append (word)
b.close()
textset=list(set(text))
follow={}
for l in range(len(textset)):
    working=[]
    check=textset[l]
    for w in range(len(text)-1):
        if check==text[w] and text[w][-1] not in '(),.?!':
            working.append(str(text[w+1]))
    follow[check]=working
a=open('lexicon-luke','wb')
pickle.dump(follow,a,2)
a.close()

Here is the bot:

#lukebot.py
import pickle,random
a=open('lexicon-luke','rb')
successorlist=pickle.load(a)
a.close()
def nextword(a):
    if a in successorlist:
        return random.choice(successorlist[a])
    else:
        return 'the'
speech=''
while speech!='quit':
    speech=raw_input('>')
    s=random.choice(speech.split())
    response=''
    while True:
        neword=nextword(s)
        response+=' '+neword
        s=neword
        if neword[-1] in ',?!.':
            break
    print response

You tend to get an uncanny feeling when it says something that seems partially to make sense.

欲拥i 2024-10-28 23:39:22

你可以这样做:
使用单词而不是字母制作一个 1 阶马尔可夫链生成器。
每当有人发布内容时,他发布的内容都会添加到机器人数据库中。
此外,机器人会在他去聊天时以及某人发布第一篇帖子(以 10 秒的倍数)时节省时间,然后他会节省同一个人等待再次发帖的时间(以 10 秒的倍数)......
第二部分将用于查看该人何时发帖,因此他加入聊天,并在一段时间后根据“一个人在加入聊天后多少 10 秒后发帖”的表格,然后他会继续与同桌发帖,思考“在他用 X 秒思考和撰写的帖子之后发布的帖子是如何使用的”

You could do like this:
Make a order 1 markov chain generator, using words and not letters.
Everytime someone post something, what he posted is added to bot database.
Also bot would save when he gone to chat and when a guy posted the first post (in multiples of 10 seconds), then he would save the amount of time this same guy waited to post again (in multiples of 10 seconds)...
This second part would be used to see when the guy will post, so he join the chat and after some amount of time based on a table with "after how many 10 seconds the a guy posted after joining the chat", then he would continue to post with the same table thinking "how was the amount of time used to write the the post that was posted after a post that he used X seconds to think about and write"

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文