如何使用 NLTK pos 标签获得更好的结果

发布于 2024-12-15 15:54:51 字数 179 浏览 6 评论 0原文

我正在使用Python学习nltk。我尝试在各种句子上做 pos_tag 。但得到的结果并不准确。我怎样才能即兴创作结果？

broke = NN
flimsy = NN
crap = NN

此外，我还收到了很多被归类为 NN 的额外单词。我怎样才能过滤掉这些以获得更好的结果？

原文

I am just learning nltk using Python. I tried doing pos_tag on various sentences. But the results obtained are not accurate. How can I improvise the results ?

broke = NN
flimsy = NN
crap = NN

Also I am getting lot of extra words being categorized as NN. How can I filter these out to get better results.?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

风柔一江水 2024-12-22 15:54:51

给出上下文，你就得到了这些结果。举个例子，我在上下文短语“ They Break climsy crap”上使用 pos_tag 获得了其他结果：

import nltk
text=nltk.word_tokenize("They broke flimsy crap")
nltk.pos_tag(text)

<块引用>
<块引用>
[('他们', 'PRP'), ('破产', 'VBP'), ('脆弱', 'JJ'), ('废话', 'NN')]

无论如何，如果你认为你看到了这一点很多单词被错误地归类为“NN”，您可以专门针对那些标记为“NN”的单词应用其他技术。
例如，您可以采用一些适当的标记语料库并使用三元标记器对其进行分类。
（实际上，作者在 http:// nltk.googlecode.com/svn/trunk/doc/book/ch05.html）。

像这样的事情：

pos_tag_results=nltk.pos_tag(your_text) #tagged sentences with pos_tag
trigram_tagger=nltk.TrigramTagger(tagged_corpora) #build trigram tagger based on your tagged_corpora
trigram_tag_results=trigram_tagger(your_text) #tagged sentences with trigram tagger
for i in range(0,len(pos_tag_results)):
    if pos_tag_results[i][1]=='NN':
        pos_tag_results[i][1]=trigram_tag_results[i][1]#for 'NN' take trigram_tagger instead

如果它可以改善您的结果，请告诉我。

Give the context, there you obtained these results. Just as example, I'm obtaining other results with pos_tag on the context phrase "They broke climsy crap":

import nltk
text=nltk.word_tokenize("They broke flimsy crap")
nltk.pos_tag(text)

[('They', 'PRP'), ('broke', 'VBP'), ('flimsy', 'JJ'), ('crap', 'NN')]

Anyway, if you see that in your opinion a lot of word are falsely cathegorized as 'NN', you can apply some other technique specially on those which are marked a s 'NN'.
For instance, you can take some appropriate tagged corpora and classify it with trigram tagger.
(actually in the same way the authors do it with bigrams on http://nltk.googlecode.com/svn/trunk/doc/book/ch05.html).

Something like this:

pos_tag_results=nltk.pos_tag(your_text) #tagged sentences with pos_tag
trigram_tagger=nltk.TrigramTagger(tagged_corpora) #build trigram tagger based on your tagged_corpora
trigram_tag_results=trigram_tagger(your_text) #tagged sentences with trigram tagger
for i in range(0,len(pos_tag_results)):
    if pos_tag_results[i][1]=='NN':
        pos_tag_results[i][1]=trigram_tag_results[i][1]#for 'NN' take trigram_tagger instead

Let me know if it improves your results.

回复收藏 0 原文

~没有更多了~