如何在 nltk 中使用 hunpos 标记文本文件？

发布于 2024-10-18 22:38:14 字数 1002 浏览 13 评论 0原文

有人可以帮我解决在 nltk 中标记语料库的 hunpos 语法吗？

我要为 hunpos.HunPosTagger 模块？
如何对语料库进行 HunPosTag？请参阅下面的代码。

import nltk 
from nltk.corpus import PlaintextCorpusReader  
from nltk.corpus.util import LazyCorpusLoader  

corpus_root = './'  
reader = PlaintextCorpusReader (corpus_root, '.*')  

ntuen = LazyCorpusLoader ('ntumultien', PlaintextCorpusReader, reader)  
ntuen.fileids()  
isinstance (ntuen, PlaintextCorpusReader)  


# So how do I hunpos tag `ntuen`? I can't get the following code to work.
# please help me to correct my python syntax errors, I'm new to python 
# but i really need this to work. sorry
##from nltk.tag import hunpos.HunPosTagger
ht = HunPosTagger('english.model')
for sentence in ntu.sent() ##looping through the no. of sentence
     ht.tag(ntusent()[i])

原文

Can someone help me with the syntax for hunpos tagging a corpus in nltk?

What do I import for the hunpos.HunPosTagger module?
How do I HunPosTag the corpus? See the code below.

import nltk 
from nltk.corpus import PlaintextCorpusReader  
from nltk.corpus.util import LazyCorpusLoader  

corpus_root = './'  
reader = PlaintextCorpusReader (corpus_root, '.*')  

ntuen = LazyCorpusLoader ('ntumultien', PlaintextCorpusReader, reader)  
ntuen.fileids()  
isinstance (ntuen, PlaintextCorpusReader)  


# So how do I hunpos tag `ntuen`? I can't get the following code to work.
# please help me to correct my python syntax errors, I'm new to python 
# but i really need this to work. sorry
##from nltk.tag import hunpos.HunPosTagger
ht = HunPosTagger('english.model')
for sentence in ntu.sent() ##looping through the no. of sentence
     ht.tag(ntusent()[i])

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

太傻旳人生 2024-10-25 22:38:14

import nltk 
from nltk.tag.hunpos import HunposTagger
from nltk.tokenize import word_tokenize

corpus = "so how do i hunpos tag my ntuen ? i can't get the following code to work."
#please help me to correct my python syntax errors, i'm new to python 
#but i really need this to work. sorry
##from nltk.tag import hunpos.HunPosTagger
ht = HunposTagger('en_wsj.model')
print ht.tag(word_tokenize(corpus))

我觉得问题是你没有对单词进行标记，但代码可能无法工作还有其他原因（是 HunposTagger，而不是 HunPosTagger）。我根据你的问题做了这个简化的例子。如果您还有其他疑问，请发表评论。

我从这里获取了所有内容：http://code.google.com/p/hunpos/

python hunpos.py
[('so', 'RB'), ('如何', 'WRB'), ('do', 'VBP'), ('i', 'FW'), ('hunpos', ' NN'), ('标签', 'NN'), ('我的', 'PRP$'), ('ntuen', 'NN'), ('?', '.'), ('i', 'FW'), ('ca', 'MD'), ('n't', 'RB'), ('get', 'VB'), ('the', 'DT'), ('以下', 'JJ'), ('代码', 'NN'), ('到', 'TO'), ('工作', 'VB'), ('.', '.')]

import nltk 
from nltk.tag.hunpos import HunposTagger
from nltk.tokenize import word_tokenize

corpus = "so how do i hunpos tag my ntuen ? i can't get the following code to work."
#please help me to correct my python syntax errors, i'm new to python 
#but i really need this to work. sorry
##from nltk.tag import hunpos.HunPosTagger
ht = HunposTagger('en_wsj.model')
print ht.tag(word_tokenize(corpus))

I feel like the problem is you're not tokenizing the words, but there are other reasons the code may not work (it's HunposTagger, not HunPosTagger). I made this simplified example from your question. If you have any more questions please post a comment.

I got everything from here: http://code.google.com/p/hunpos/

python hunpos.py
[('so', 'RB'), ('how', 'WRB'), ('do', 'VBP'), ('i', 'FW'), ('hunpos', 'NN'), ('tag', 'NN'), ('my', 'PRP$'), ('ntuen', 'NN'), ('?', '.'), ('i', 'FW'), ('ca', 'MD'), ("n't", 'RB'), ('get', 'VB'), ('the', 'DT'), ('following', 'JJ'), ('code', 'NN'), ('to', 'TO'), ('work', 'VB'), ('.', '.')]

回复收藏 0 原文

~没有更多了~