TypeError：＆＃x27; lazycorpusloader＆＃x27; tfidfvectorizer中的对象是不可能的

发布于 2025-01-29 08:39:29 字数 3530 浏览 1 评论 0原文

我已经在这个问题上看到了一个单独的线程。但是我的错误是通过使用不同的步骤而产生的。所以重复的帖子我正在使用Python 3.x。我需要使用文本包含复合名词进行文本群集。我使用charsplit中的分离器将复合名词分开。我正在使用以下代码来完成

from charsplit import Splitter
text=list()
length=1
length=len(texts)
splitted=[]
unSplitted=[]

for i in range(length):
    try:
        splitter = Splitter()
        z=splitter.split_compound(texts[i])
       # print("Text : "+texts[i])
        
        split=pd.DataFrame(z)
        #Print("splited text : ",z)
        #print("i=" +str(i)+"Text : "+texts[i])
        
        #split.head()
        splitted=split[split[0]==split[0].max()][1].to_string()+" "+split[split[0]==split[0].max()][2].to_string()
        #print("Splitted= "+splitted)
        #mylist=[split[split[0]==split[0].max()].iloc[:,[1,2]]]    
        #splitted.append(mylist)
        splitted.append(splitted)
    except:
        unSplitted.append(texts[i])

我分裂列表的示例，就像

[0    waterflow Plus-/Minus- on right channel 0    interface',
 '0    flow  , Automatic valve Start/Stop    left',
 '0    flow 0    , Automatic valve Start/Stop ...',

我想使用此分裂文本进行文本聚类。因此，我正在使用-tfidfvectorizer＆amp;编写以下代码，

vectorizer = TfidfVectorizer(
    stop_words=stopwords,
)

X = vectorizer.fit_transform(splitted_german)

但这是投掷错误

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_411/935444557.py in <module>
      4 )
      5 
----> 6 X = vectorizer.fit_transform(splitted_german)

/opt/conda/lib/python3.9/site-packages/sklearn/feature_extraction/text.py in fit_transform(self, raw_documents, y)
   2075         """
   2076         self._check_params()
-> 2077         X = super().fit_transform(raw_documents)
   2078         self._tfidf.fit(X)
   2079         # X is already a transformed view of raw_documents so

/opt/conda/lib/python3.9/site-packages/sklearn/feature_extraction/text.py in fit_transform(self, raw_documents, y)
   1328                     break
   1329 
-> 1330         vocabulary, X = self._count_vocab(raw_documents, self.fixed_vocabulary_)
   1331 
   1332         if self.binary:

/opt/conda/lib/python3.9/site-packages/sklearn/feature_extraction/text.py in _count_vocab(self, raw_documents, fixed_vocab)
   1191             vocabulary.default_factory = vocabulary.__len__
   1192 
-> 1193         analyze = self.build_analyzer()
   1194         j_indices = []
   1195         indptr = []

/opt/conda/lib/python3.9/site-packages/sklearn/feature_extraction/text.py in build_analyzer(self)
    444 
    445         elif self.analyzer == "word":
--> 446             stop_words = self.get_stop_words()
    447             tokenize = self.build_tokenizer()
    448             self._check_stop_words_consistency(stop_words, preprocess, tokenize)

/opt/conda/lib/python3.9/site-packages/sklearn/feature_extraction/text.py in get_stop_words(self)
    366                 A list of stop words.
    367         """
--> 368         return _check_stop_list(self.stop_words)
    369 
    370     def _check_stop_words_consistency(self, stop_words, preprocess, tokenize):

/opt/conda/lib/python3.9/site-packages/sklearn/feature_extraction/text.py in _check_stop_list(stop)
    190         return None
    191     else:  # assume it's a collection
--> 192         return frozenset(stop)
    193 
    194 

TypeError: 'LazyCorpusLoader' object is not iterable

您可以指导我解决问题吗？

原文

I have seen a separate thread on this issue. But my error is generating from using different step. So the duplicate post
I'm using Python 3.x. I need to do text clustering using text contains compound noun. I used Splitter in charsplit to split the compound noun. I'm using following code to do this

from charsplit import Splitter
text=list()
length=1
length=len(texts)
splitted=[]
unSplitted=[]

for i in range(length):
    try:
        splitter = Splitter()
        z=splitter.split_compound(texts[i])
       # print("Text : "+texts[i])
        
        split=pd.DataFrame(z)
        #Print("splited text : ",z)
        #print("i=" +str(i)+"Text : "+texts[i])
        
        #split.head()
        splitted=split[split[0]==split[0].max()][1].to_string()+" "+split[split[0]==split[0].max()][2].to_string()
        #print("Splitted= "+splitted)
        #mylist=[split[split[0]==split[0].max()].iloc[:,[1,2]]]    
        #splitted.append(mylist)
        splitted.append(splitted)
    except:
        unSplitted.append(texts[i])

Sample of my splitted list looks like

[0    waterflow Plus-/Minus- on right channel 0    interface',
 '0    flow  , Automatic valve Start/Stop    left',
 '0    flow 0    , Automatic valve Start/Stop ...',

I want to do text clustering using this splitted text. So I'm using -TfidfVectorizer & wrote following code

vectorizer = TfidfVectorizer(
    stop_words=stopwords,
)

X = vectorizer.fit_transform(splitted_german)

But it's throwing error

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_411/935444557.py in <module>
      4 )
      5 
----> 6 X = vectorizer.fit_transform(splitted_german)

/opt/conda/lib/python3.9/site-packages/sklearn/feature_extraction/text.py in fit_transform(self, raw_documents, y)
   2075         """
   2076         self._check_params()
-> 2077         X = super().fit_transform(raw_documents)
   2078         self._tfidf.fit(X)
   2079         # X is already a transformed view of raw_documents so

/opt/conda/lib/python3.9/site-packages/sklearn/feature_extraction/text.py in fit_transform(self, raw_documents, y)
   1328                     break
   1329 
-> 1330         vocabulary, X = self._count_vocab(raw_documents, self.fixed_vocabulary_)
   1331 
   1332         if self.binary:

/opt/conda/lib/python3.9/site-packages/sklearn/feature_extraction/text.py in _count_vocab(self, raw_documents, fixed_vocab)
   1191             vocabulary.default_factory = vocabulary.__len__
   1192 
-> 1193         analyze = self.build_analyzer()
   1194         j_indices = []
   1195         indptr = []

/opt/conda/lib/python3.9/site-packages/sklearn/feature_extraction/text.py in build_analyzer(self)
    444 
    445         elif self.analyzer == "word":
--> 446             stop_words = self.get_stop_words()
    447             tokenize = self.build_tokenizer()
    448             self._check_stop_words_consistency(stop_words, preprocess, tokenize)

/opt/conda/lib/python3.9/site-packages/sklearn/feature_extraction/text.py in get_stop_words(self)
    366                 A list of stop words.
    367         """
--> 368         return _check_stop_list(self.stop_words)
    369 
    370     def _check_stop_words_consistency(self, stop_words, preprocess, tokenize):

/opt/conda/lib/python3.9/site-packages/sklearn/feature_extraction/text.py in _check_stop_list(stop)
    190         return None
    191     else:  # assume it's a collection
--> 192         return frozenset(stop)
    193 
    194 

TypeError: 'LazyCorpusLoader' object is not iterable

Can you guide me to resolve the issue?

分享到QQ

分享到微博