条件随机场——它们是如何工作的?

发布于 2024-09-18 06:12:44 字数 657 浏览 7 评论 0原文

我已阅读此问题。我明白了一半。有人可以帮我弄清楚如何实施它吗?

我假设特征是通过一些启发式生成的。以词性标注器为例;也许查看训练数据表明 'bird' 在所有情况下都被标记为 NOUN,因此特征 f1(z_(n-1),z_n,X, n) 生成为

(if x_n = 'bird' and z_n = NOUN then 1 else 0)

其中 X 是输入向量,Z 是输出向量。在权重训练过程中,我们发现这个 f1 永远不会被违反,因此相应的权重 \1 (对于 lambda 来说是 \)最终会是正数,并且训练后比较大。猜测功能和训练似乎都具有挑战性,但在其他方面却很简单。

我不知道如何将模型应用于未标记的数据。使用一些任意标签初始化输出向量,然后更改标签,使所有 \ * f? 的总和增加。

对此的任何帮助将不胜感激。

I've read the papers linked to in this question. I half get it. Can someone help me figure out how to implement it?

I assume that features are generated by some heuristic. Using a POS-tagger as an example; Maybe looking at training data shows that 'bird' is tagged with NOUN in all cases, so feature f1(z_(n-1),z_n,X,n) is generated as

(if x_n = 'bird' and z_n = NOUN then 1 else 0)

Where X is the input vector and Z is the output vector. During training for weights, we find that this f1 is never violated, so corresponding weight \1 (\ for lambda) would end up positive and relatively large after training. Both guessing features and training seem challenging implement, but otherwise straightforward.

I'm lost on how one applies the model to untagged data. Initialize the output vector with some arbitrary labels, and then change labels where they increase the sum over all the \ * f?

Any help on this would be greatly appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

貪欢 2024-09-25 06:12:44

我不完全确定我是否正确理解你的意思,但是是的,在输出端,每个向量都用开始和结束符号进行增强。

您关于由某些启发式生成的特征函数也是正确的。通常启发式是采用所有可能的组合。在您的示例中,每对(单词,标签)都会有一个特征函数,从而产生大量特征。制定此类特征的常见方法是使用特征模板。

在评估模型时,您不关心标准化,因此您正在寻找为您提供最大分子项的序列。通常使用维特比算法来执行此操作,除非非常大的标签集 - 或者在您的示例中存在大量可能的标签 - 在这种情况下使用近似值。

CRF 上的 Viterbi 工作方式与 HMM 非常相似。您从序列的开头开始,计算以当前单词结尾的最大概率,即每个单词相对于所有前一个单词的最大值,或者,因为只有一个前一个单词,所以计算 START 符号。在下一步中,您将迭代所有标签,这些标签可能用于预测的第二个元素,即 z_2。非归一化概率的最大值可以根据前驱节点的值(即您在第一步中计算的值和您的模型)来计算。特别是,您可以将前任节点的潜力、到相关节点的转换以及节点本身结合起来,并找到所有前任节点的最大值。是的,由于特征函数不限制对源端的依赖,您可以从中获取任何信息。

当您到达终点时,您可以返回以确定如何达到最大值。

如需进一步阅读,我推荐 Rahul Gupta 的报告。

I am not completely sure if i understand you correctly but yes, on the output side each vector is augmented with a start and end symbol.

You are also right about feature functions being generated by some heuristic. Usually the heuristic is to take all possible combinations. In your example there would be a feature function for each pair of (word,tag) resulting in a high number of features. A common way to formulate such features is through the use of a feature template.

When evaluating the model you don't care about normalization, so you are looking for the sequence that gives you the largest numerator term. Usually the Viterbi algorithm is used to do so except for very large label sets - or in your example a large number of possible tags - in which case approximations are used.

Viterbi on CRFs works much like with HMMs. You start at the beginning of your sequence and compute the maximum probability ending with the word at hand, i.e. the maximum for each word over all predecessors or, as there is only one predecessor, the START symbol. In the next step you iterate over all labels, that are possible for the second element of your prediction i.e. z_2. The maximum of the unnormalized probability may be computed from both the values a the predecessor nodes, i.e. the values you computed in the first step and your model. In particular you combine the potentials of the predecessor, the transition to the node in question and the node itself and find the maximum over all predecessors. And yes since the feature functions does not limit the dependence on the source side you may take any information from it.

When you arrive at the end you walk back to determine how the maximum has been reached.

For further reading i recommend the report by Rahul Gupta.

零度° 2024-09-25 06:12:44

对于表示简单单词序列的线性 CRF,您可以使用 Viterbi 算法(muckl 已经提及)。

对于其他拓扑,您需要找到其他算法。

对于许多拓扑来说,精确推理是棘手的,可以使用近似推理算法,例如基于树的重新参数化。

For linear CRFs representing simple sequences of words, you can use the Viterbi algorithm (as muckl has already mentioned).

For other topologies, you'll need to find another algorithms.

Exact inference is intractable for many topologies, and an aproximate inference algorithm, such as tree-based reparameterization, can be used.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文