关于隐马尔可夫模型和条件随机场的问题
我一直在研究隐马尔可夫模型和条件随机场来完成命名实体识别的任务,我似乎坚持一个基本概念,也就是说:学习过程的目标是从训练数据计算argmax ,并将该 argmax 序列应用于测试数据的所有实例?
考虑这个隐马尔可夫模型示例:我有两个状态 {1,0},其中 1 是实体,0 是任何其他单词。为了简单起见,我现在不关心实体分类,而只关心实体检测。
我的训练数据如下:
奥巴马住在华盛顿 1 0 0 1
iPad 很棒 0 1 0 0
史蒂夫·乔布斯生病了 1 1 0 0
现在遵循 argmax 规则,其中:
P(状态 1 到状态 1) = 1/9
P(状态 1 到状态 0) = 1 - 1/9
P(状态 0 到状态 0) = 3/9
P (状态 0 到状态 1) = 1 - 3/9
计算出 V 和 U 矩阵后,我发现:
从训练数据中提取的最佳标签序列 = 1 1 0 0
现在考虑测试句子:
The iPhone is Great
我是否只将测试语句应用于 1 1 0 0,这实际上是有效的,但如果我有另一个测试语句,例如“索尼发言人被解雇”,您可以看到序列 1 1 0 0 将完全无用对于那句话。
总结一下:训练的目的是提取一个最佳标签序列并将其应用于所有测试句子吗?看来不太可能!我缺少什么?
I have been looking at Hidden Markov Models and Conditional Random Fields for the tasks of Named Entity Recognition, and I seem to be stuck on a fundamental concept, which is to say: Is the goal of the learning process to calculate argmax from the training data, and apply that argmax sequence to all instances of the test data?
Consider this Hidden Markov Model example: I have two states {1,0}, where 1 is an entity and 0 being any other word. For simplification's sake, I'm not concerning myself with entity categorization just yet, rather just entity detection.
My training data is as follows:
Obama lives in Washington
1 0 0 1
The iPad is great
0 1 0 0
Steve Jobs is sick
1 1 0 0
Now following argmax rules, with:
P(State 1 to State 1) = 1/9
P(State 1 to State 0) = 1 - 1/9
P(State 0 to State 0) = 3/9
P(State 0 to State 1) = 1 - 3/9
And after working out V and U matrices, I find that:
The best label sequence extracted from the training data = 1 1 0 0
Now consider the test sentence:
The iPhone is great
Do I just apply the test sentence to 1 1 0 0, which would actually work, but if I have another test sentence like, "A spokesperson for Sony was fired", you can see that the sequence 1 1 0 0 would be completely useless for that sentence.
To summarize: is the purpose of training to extract ONE best label sequence and apply that to all test sentences? It would seem unlikely! What am I missing??
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我强烈建议您阅读本次关于 HMM 的讲座。这是 HMM 定义的摘录
您似乎缺少 e 并且没有正确计算 q 。
考虑到上述参数的乘积,最佳标签序列是最可能的标签序列(抱歉没有发布公式)。
对于您的示例“索尼发言人被解雇”,所有序列都是:
您应该计算 e(A|0) , e(发言人|0), q(0|*,*), q(0|*,0 )等,然后将它们相乘,得到概率最大的序列。
由于这是一项耗时的任务,并且对于较长的序列呈指数增长,因此使用了维特比算法(也在讲座中进行了描述)
I strongly recommend that you read this lecture on HMM. Here's an excerpt from the HMM definition
You seem to be missing the e and you are not calculating the q correctly.
The best sequence of tags is the most probable one, considering the products of the above parameters (sorry for not posting the formula).
For your example "A spokesperson for Sony was fired" all the sequences are:
and you should calculate e(A|0) , e(spokesperson|0), q(0|*,*), q(0|*,0), etc. Then multiply them accordingly and get the sequence with highest probability.
Since this is a time consuming task and grows exponentially for longer sequences, the viterbi algorithm is used (also described in the lecture)