TypeError:' lazycorpusloader' tfidfvectorizer中的对象是不可能的

发布于 2025-01-29 08:39:29 字数 3530 浏览 1 评论 0原文

我已经在这个问题上看到了一个单独的线程。但是我的错误是通过使用不同的步骤而产生的。所以重复的帖子 我正在使用Python 3.x。我需要使用文本包含复合名词进行文本群集。我使用charsplit中的分离器将复合名词分开。我正在使用以下代码来完成

from charsplit import Splitter
text=list()
length=1
length=len(texts)
splitted=[]
unSplitted=[]

for i in range(length):
    try:
        splitter = Splitter()
        z=splitter.split_compound(texts[i])
       # print("Text : "+texts[i])
        
        split=pd.DataFrame(z)
        #Print("splited text : ",z)
        #print("i=" +str(i)+"Text : "+texts[i])
        
        #split.head()
        splitted=split[split[0]==split[0].max()][1].to_string()+" "+split[split[0]==split[0].max()][2].to_string()
        #print("Splitted= "+splitted)
        #mylist=[split[split[0]==split[0].max()].iloc[:,[1,2]]]    
        #splitted.append(mylist)
        splitted.append(splitted)
    except:
        unSplitted.append(texts[i])

我分裂列表的示例,就像

[0    waterflow Plus-/Minus- on right channel 0    interface',
 '0    flow  , Automatic valve Start/Stop    left',
 '0    flow 0    , Automatic valve Start/Stop ...',

我想使用此分裂文本进行文本聚类。因此,我正在使用-tfidfvectorizer&编写以下代码,

vectorizer = TfidfVectorizer(
    stop_words=stopwords,
)

X = vectorizer.fit_transform(splitted_german) 

但这是投掷错误

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_411/935444557.py in <module>
      4 )
      5 
----> 6 X = vectorizer.fit_transform(splitted_german)

/opt/conda/lib/python3.9/site-packages/sklearn/feature_extraction/text.py in fit_transform(self, raw_documents, y)
   2075         """
   2076         self._check_params()
-> 2077         X = super().fit_transform(raw_documents)
   2078         self._tfidf.fit(X)
   2079         # X is already a transformed view of raw_documents so

/opt/conda/lib/python3.9/site-packages/sklearn/feature_extraction/text.py in fit_transform(self, raw_documents, y)
   1328                     break
   1329 
-> 1330         vocabulary, X = self._count_vocab(raw_documents, self.fixed_vocabulary_)
   1331 
   1332         if self.binary:

/opt/conda/lib/python3.9/site-packages/sklearn/feature_extraction/text.py in _count_vocab(self, raw_documents, fixed_vocab)
   1191             vocabulary.default_factory = vocabulary.__len__
   1192 
-> 1193         analyze = self.build_analyzer()
   1194         j_indices = []
   1195         indptr = []

/opt/conda/lib/python3.9/site-packages/sklearn/feature_extraction/text.py in build_analyzer(self)
    444 
    445         elif self.analyzer == "word":
--> 446             stop_words = self.get_stop_words()
    447             tokenize = self.build_tokenizer()
    448             self._check_stop_words_consistency(stop_words, preprocess, tokenize)

/opt/conda/lib/python3.9/site-packages/sklearn/feature_extraction/text.py in get_stop_words(self)
    366                 A list of stop words.
    367         """
--> 368         return _check_stop_list(self.stop_words)
    369 
    370     def _check_stop_words_consistency(self, stop_words, preprocess, tokenize):

/opt/conda/lib/python3.9/site-packages/sklearn/feature_extraction/text.py in _check_stop_list(stop)
    190         return None
    191     else:  # assume it's a collection
--> 192         return frozenset(stop)
    193 
    194 

TypeError: 'LazyCorpusLoader' object is not iterable

您可以指导我解决问题吗?

I have seen a separate thread on this issue. But my error is generating from using different step. So the duplicate post
I'm using Python 3.x. I need to do text clustering using text contains compound noun. I used Splitter in charsplit to split the compound noun. I'm using following code to do this

from charsplit import Splitter
text=list()
length=1
length=len(texts)
splitted=[]
unSplitted=[]

for i in range(length):
    try:
        splitter = Splitter()
        z=splitter.split_compound(texts[i])
       # print("Text : "+texts[i])
        
        split=pd.DataFrame(z)
        #Print("splited text : ",z)
        #print("i=" +str(i)+"Text : "+texts[i])
        
        #split.head()
        splitted=split[split[0]==split[0].max()][1].to_string()+" "+split[split[0]==split[0].max()][2].to_string()
        #print("Splitted= "+splitted)
        #mylist=[split[split[0]==split[0].max()].iloc[:,[1,2]]]    
        #splitted.append(mylist)
        splitted.append(splitted)
    except:
        unSplitted.append(texts[i])

Sample of my splitted list looks like

[0    waterflow Plus-/Minus- on right channel 0    interface',
 '0    flow  , Automatic valve Start/Stop    left',
 '0    flow 0    , Automatic valve Start/Stop ...',

I want to do text clustering using this splitted text. So I'm using -TfidfVectorizer & wrote following code

vectorizer = TfidfVectorizer(
    stop_words=stopwords,
)

X = vectorizer.fit_transform(splitted_german) 

But it's throwing error

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_411/935444557.py in <module>
      4 )
      5 
----> 6 X = vectorizer.fit_transform(splitted_german)

/opt/conda/lib/python3.9/site-packages/sklearn/feature_extraction/text.py in fit_transform(self, raw_documents, y)
   2075         """
   2076         self._check_params()
-> 2077         X = super().fit_transform(raw_documents)
   2078         self._tfidf.fit(X)
   2079         # X is already a transformed view of raw_documents so

/opt/conda/lib/python3.9/site-packages/sklearn/feature_extraction/text.py in fit_transform(self, raw_documents, y)
   1328                     break
   1329 
-> 1330         vocabulary, X = self._count_vocab(raw_documents, self.fixed_vocabulary_)
   1331 
   1332         if self.binary:

/opt/conda/lib/python3.9/site-packages/sklearn/feature_extraction/text.py in _count_vocab(self, raw_documents, fixed_vocab)
   1191             vocabulary.default_factory = vocabulary.__len__
   1192 
-> 1193         analyze = self.build_analyzer()
   1194         j_indices = []
   1195         indptr = []

/opt/conda/lib/python3.9/site-packages/sklearn/feature_extraction/text.py in build_analyzer(self)
    444 
    445         elif self.analyzer == "word":
--> 446             stop_words = self.get_stop_words()
    447             tokenize = self.build_tokenizer()
    448             self._check_stop_words_consistency(stop_words, preprocess, tokenize)

/opt/conda/lib/python3.9/site-packages/sklearn/feature_extraction/text.py in get_stop_words(self)
    366                 A list of stop words.
    367         """
--> 368         return _check_stop_list(self.stop_words)
    369 
    370     def _check_stop_words_consistency(self, stop_words, preprocess, tokenize):

/opt/conda/lib/python3.9/site-packages/sklearn/feature_extraction/text.py in _check_stop_list(stop)
    190         return None
    191     else:  # assume it's a collection
--> 192         return frozenset(stop)
    193 
    194 

TypeError: 'LazyCorpusLoader' object is not iterable

Can you guide me to resolve the issue?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文