在 nps_chat 语料库上训练一元标记器时出现问题

发布于 2024-12-03 02:38:54 字数 861 浏览 1 评论 0原文

起初，我尝试在标记句子上训练标记器，但与布朗，nps聊天语料库似乎没有采用tagged_sents()方法。然后我尝试对标记词进行训练，Python 返回了错误消息：

> Traceback (most recent call last):    File "<pyshell#55>", line 1, in
> <module> 
>     unigram_tagger = nltk.UnigramTagger(training_set)    File
> "C:\Python26\lib\site-packages\nltk\tag\sequential.py", line 287, in 
> __init__ 
>     backoff, cutoff, verbose)    File
> "C:\Python26\lib\site-packages\nltk\tag\sequential.py", line 270, in 
> __init__ 
>     self._train(train, cutoff, verbose)    File
> "C:\Python26\lib\site-packages\nltk\tag\sequential.py", line 181, in 
> _train 
>     tokens, tags = zip(*sentence)  ValueError: need more than 1 value
> to unpack

我怀疑这个问题与我正在尝试的事实有关根据标记的单词而不是句子来训练标记器，但是什么是如果 nps chat 不采用 tagged_sents 方法，解决方案是什么？以及为什么它不接受这种方法吗？请指教。

原文

At first I tried training the tagger on tagged sentences, but unlike with
Brown, the nps chat corpus doesn't seem to take the tagged_sents() method.
So then I tried training on tagged words and Python returned the error
message:

> Traceback (most recent call last):    File "<pyshell#55>", line 1, in
> <module> 
>     unigram_tagger = nltk.UnigramTagger(training_set)    File
> "C:\Python26\lib\site-packages\nltk\tag\sequential.py", line 287, in 
> __init__ 
>     backoff, cutoff, verbose)    File
> "C:\Python26\lib\site-packages\nltk\tag\sequential.py", line 270, in 
> __init__ 
>     self._train(train, cutoff, verbose)    File
> "C:\Python26\lib\site-packages\nltk\tag\sequential.py", line 181, in 
> _train 
>     tokens, tags = zip(*sentence)  ValueError: need more than 1 value
> to unpack

I suspect the issue has something to do with the fact that I'm trying to
train the tagger on tagged words rather than sentences, but what's the
solution to this if nps chat doesn't take the tagged_sents method? And why
doesn't it accept that method? Please advise.

分享到QQ

分享到微博