在哪里可以下载标记词词典和规则?
我正在学习通过应用转换规则来标记词性。第一步是使用字典为文本中的每个单词标记可能的词性,例如:
communicative JJ
communicator NN
communicators NNS
communion NN
communique NN
communiques NNS
communism NN
第二步是应用转换规则来更改标签。我只有一本非常小的字典,其中包含上述单词/标签对。哪里可以找到大的,哪里可以找到转换规则?据说基于变换的标记可能有很多规则。我在哪里可以找到规则?
先感谢您。
I am learning to tag part of speech by applying transformational rules. The first step is to tag the possible POS to each word in a text by using a dictionary like:
communicative JJ
communicator NN
communicators NNS
communion NN
communique NN
communiques NNS
communism NN
The second step is to apply transformational rules to change tags. I have only a very small dictionary containing the above word/tag pairs. Where can I find a large one and where can I find transformational rules? It is said tagging based on transformation may have a lot of rules. Where can I find the rules?
Thank you in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以从语料库中获取可能性,例如 NLTK 中提供的可能性。如果您想进行机器学习标记,这也将为您提供估计概率的频率(Brill 风格)。
规则必须是手工制定的,之后机器学习者才能找出何时应用哪些规则。例如,请参阅Brill 的博士论文了解英语规则。
You'd obtain the possibilities from a corpus, such as those available in NLTK. That would also give you frequencies from which to estimate probabilities, if you want to do machine-learned tagging (Brill-style).
The rules must be handcrafted, after which the machine learner can find out when to apply which ones. See, e.g., Brill's PhD thesis for English rules.