关于隐马尔可夫模型和条件随机场的问题

发布于 2024-10-20 00:57:12 字数 697 浏览 1 评论 0原文

我一直在研究隐马尔可夫模型和条件随机场来完成命名实体识别的任务,我似乎坚持一个基本概念,也就是说:学习过程的目标是从训练数据计算argmax ,并将该 argmax 序列应用于测试数据的所有实例?

考虑这个隐马尔可夫模型示例:我有两个状态 {1,0},其中 1 是实体,0 是任何其他单词。为了简单起见,我现在不关心实体分类,而只关心实体检测。

我的训练数据如下:

奥巴马住在华盛顿 1 0 0 1

iPad 很棒 0 1 0 0

史蒂夫·乔布斯生病了 1 1 0 0

现在遵循 argmax 规则,其中:

P(状态 1 到状态 1) = 1/9

P(状态 1 到状态 0) = 1 - 1/9

P(状态 0 到状态 0) = 3/9

P (状态 0 到状态 1) = 1 - 3/9

计算出 V 和 U 矩阵后,我发现:

从训练数据中提取的最佳标签序列 = 1 1 0 0

现在考虑测试句子:

The iPhone is Great

我是否只将测试语句应用于 1 1 0 0,这实际上是有效的,但如果我有另一个测试语句,例如“索尼发言人被解雇”,您可以看到序列 1 1 0 0 将完全无用对于那句话。

总结一下:训练的目的是提取一个最佳标签序列并将其应用于所有测试句子吗?看来不太可能!我缺少什么?

I have been looking at Hidden Markov Models and Conditional Random Fields for the tasks of Named Entity Recognition, and I seem to be stuck on a fundamental concept, which is to say: Is the goal of the learning process to calculate argmax from the training data, and apply that argmax sequence to all instances of the test data?

Consider this Hidden Markov Model example: I have two states {1,0}, where 1 is an entity and 0 being any other word. For simplification's sake, I'm not concerning myself with entity categorization just yet, rather just entity detection.

My training data is as follows:

Obama lives in Washington
1 0 0 1

The iPad is great
0 1 0 0

Steve Jobs is sick
1 1 0 0

Now following argmax rules, with:

P(State 1 to State 1) = 1/9

P(State 1 to State 0) = 1 - 1/9

P(State 0 to State 0) = 3/9

P(State 0 to State 1) = 1 - 3/9

And after working out V and U matrices, I find that:

The best label sequence extracted from the training data = 1 1 0 0

Now consider the test sentence:

The iPhone is great

Do I just apply the test sentence to 1 1 0 0, which would actually work, but if I have another test sentence like, "A spokesperson for Sony was fired", you can see that the sequence 1 1 0 0 would be completely useless for that sentence.

To summarize: is the purpose of training to extract ONE best label sequence and apply that to all test sentences? It would seem unlikely! What am I missing??

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

凉城已无爱 2024-10-27 00:57:12

我强烈建议您阅读本次关于 HMM 的讲座。这是 HMM 定义的摘录

任何三元组 (u,v,s) 的参数 q(s|u,v) 使得 s ∈ K ∪ {STOP},
且 u,v ∈ K ∪ {*}。 q(s|u,v) 的值可以解释为
在标签二元组之后立即看到标签 s 的概率
(u,v)。

对于任何 x ∈ V, s ∈ K 的参数 e(x|s)。e(x|s) 的值可以是
解释为看到观察 x 配对的概率
状态。

您似乎缺少 e 并且没有正确计算 q 。

q(1|0,0) = count <0,0,1> / count <0,0> 

考虑到上述参数的乘积,最佳标签序列是最可能的标签序列(抱歉没有发布公式)。

对于您的示例“索尼发言人被解雇”,所有序列都是:

* * 0 0 0 0 0 0 STOP
* * 0 0 0 0 0 1 STOP
...
* * 1 1 1 1 1 1 STOP

您应该计算 e(A|0) , e(发言人|0), q(0|*,*), q(0|*,0 )等,然后将它们相乘,得到概率最大的序列。

由于这是一项耗时的任务,并且对于较长的序列呈指数增长,因此使用了维特比算法(也在讲座中进行了描述)

I strongly recommend that you read this lecture on HMM. Here's an excerpt from the HMM definition

A parameter q(s|u,v) for any trigram (u,v,s) such that s ∈ K ∪ {STOP},
and u,v ∈ K ∪ {*}. The value for q(s|u,v) can be interpreted as the
probability of seeing the tag s immediately after the bigram of tags
(u,v).

A parameter e(x|s) for any x ∈ V, s ∈ K. The value for e(x|s) can be
interpreted as the probability of seeing observation x paired with
state s.

You seem to be missing the e and you are not calculating the q correctly.

q(1|0,0) = count <0,0,1> / count <0,0> 

The best sequence of tags is the most probable one, considering the products of the above parameters (sorry for not posting the formula).

For your example "A spokesperson for Sony was fired" all the sequences are:

* * 0 0 0 0 0 0 STOP
* * 0 0 0 0 0 1 STOP
...
* * 1 1 1 1 1 1 STOP

and you should calculate e(A|0) , e(spokesperson|0), q(0|*,*), q(0|*,0), etc. Then multiply them accordingly and get the sequence with highest probability.

Since this is a time consuming task and grows exponentially for longer sequences, the viterbi algorithm is used (also described in the lecture)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文