TypeError:' lazycorpusloader' tfidfvectorizer中的对象是不可能的
我已经在这个问题上看到了一个单独的线程。但是我的错误是通过使用不同的步骤而产生的。所以重复的帖子 我正在使用Python 3.x。我需要使用文本包含复合名词进行文本群集。我使用charsplit中的分离器将复合名词分开。我正在使用以下代码来完成
from charsplit import Splitter
text=list()
length=1
length=len(texts)
splitted=[]
unSplitted=[]
for i in range(length):
try:
splitter = Splitter()
z=splitter.split_compound(texts[i])
# print("Text : "+texts[i])
split=pd.DataFrame(z)
#Print("splited text : ",z)
#print("i=" +str(i)+"Text : "+texts[i])
#split.head()
splitted=split[split[0]==split[0].max()][1].to_string()+" "+split[split[0]==split[0].max()][2].to_string()
#print("Splitted= "+splitted)
#mylist=[split[split[0]==split[0].max()].iloc[:,[1,2]]]
#splitted.append(mylist)
splitted.append(splitted)
except:
unSplitted.append(texts[i])
我分裂列表的示例,就像
[0 waterflow Plus-/Minus- on right channel 0 interface',
'0 flow , Automatic valve Start/Stop left',
'0 flow 0 , Automatic valve Start/Stop ...',
我想使用此分裂文本进行文本聚类。因此,我正在使用-tfidfvectorizer&编写以下代码,
vectorizer = TfidfVectorizer(
stop_words=stopwords,
)
X = vectorizer.fit_transform(splitted_german)
但这是投掷错误
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/tmp/ipykernel_411/935444557.py in <module>
4 )
5
----> 6 X = vectorizer.fit_transform(splitted_german)
/opt/conda/lib/python3.9/site-packages/sklearn/feature_extraction/text.py in fit_transform(self, raw_documents, y)
2075 """
2076 self._check_params()
-> 2077 X = super().fit_transform(raw_documents)
2078 self._tfidf.fit(X)
2079 # X is already a transformed view of raw_documents so
/opt/conda/lib/python3.9/site-packages/sklearn/feature_extraction/text.py in fit_transform(self, raw_documents, y)
1328 break
1329
-> 1330 vocabulary, X = self._count_vocab(raw_documents, self.fixed_vocabulary_)
1331
1332 if self.binary:
/opt/conda/lib/python3.9/site-packages/sklearn/feature_extraction/text.py in _count_vocab(self, raw_documents, fixed_vocab)
1191 vocabulary.default_factory = vocabulary.__len__
1192
-> 1193 analyze = self.build_analyzer()
1194 j_indices = []
1195 indptr = []
/opt/conda/lib/python3.9/site-packages/sklearn/feature_extraction/text.py in build_analyzer(self)
444
445 elif self.analyzer == "word":
--> 446 stop_words = self.get_stop_words()
447 tokenize = self.build_tokenizer()
448 self._check_stop_words_consistency(stop_words, preprocess, tokenize)
/opt/conda/lib/python3.9/site-packages/sklearn/feature_extraction/text.py in get_stop_words(self)
366 A list of stop words.
367 """
--> 368 return _check_stop_list(self.stop_words)
369
370 def _check_stop_words_consistency(self, stop_words, preprocess, tokenize):
/opt/conda/lib/python3.9/site-packages/sklearn/feature_extraction/text.py in _check_stop_list(stop)
190 return None
191 else: # assume it's a collection
--> 192 return frozenset(stop)
193
194
TypeError: 'LazyCorpusLoader' object is not iterable
您可以指导我解决问题吗?
I have seen a separate thread on this issue. But my error is generating from using different step. So the duplicate post
I'm using Python 3.x. I need to do text clustering using text contains compound noun. I used Splitter in charsplit to split the compound noun. I'm using following code to do this
from charsplit import Splitter
text=list()
length=1
length=len(texts)
splitted=[]
unSplitted=[]
for i in range(length):
try:
splitter = Splitter()
z=splitter.split_compound(texts[i])
# print("Text : "+texts[i])
split=pd.DataFrame(z)
#Print("splited text : ",z)
#print("i=" +str(i)+"Text : "+texts[i])
#split.head()
splitted=split[split[0]==split[0].max()][1].to_string()+" "+split[split[0]==split[0].max()][2].to_string()
#print("Splitted= "+splitted)
#mylist=[split[split[0]==split[0].max()].iloc[:,[1,2]]]
#splitted.append(mylist)
splitted.append(splitted)
except:
unSplitted.append(texts[i])
Sample of my splitted list looks like
[0 waterflow Plus-/Minus- on right channel 0 interface',
'0 flow , Automatic valve Start/Stop left',
'0 flow 0 , Automatic valve Start/Stop ...',
I want to do text clustering using this splitted text. So I'm using -TfidfVectorizer & wrote following code
vectorizer = TfidfVectorizer(
stop_words=stopwords,
)
X = vectorizer.fit_transform(splitted_german)
But it's throwing error
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/tmp/ipykernel_411/935444557.py in <module>
4 )
5
----> 6 X = vectorizer.fit_transform(splitted_german)
/opt/conda/lib/python3.9/site-packages/sklearn/feature_extraction/text.py in fit_transform(self, raw_documents, y)
2075 """
2076 self._check_params()
-> 2077 X = super().fit_transform(raw_documents)
2078 self._tfidf.fit(X)
2079 # X is already a transformed view of raw_documents so
/opt/conda/lib/python3.9/site-packages/sklearn/feature_extraction/text.py in fit_transform(self, raw_documents, y)
1328 break
1329
-> 1330 vocabulary, X = self._count_vocab(raw_documents, self.fixed_vocabulary_)
1331
1332 if self.binary:
/opt/conda/lib/python3.9/site-packages/sklearn/feature_extraction/text.py in _count_vocab(self, raw_documents, fixed_vocab)
1191 vocabulary.default_factory = vocabulary.__len__
1192
-> 1193 analyze = self.build_analyzer()
1194 j_indices = []
1195 indptr = []
/opt/conda/lib/python3.9/site-packages/sklearn/feature_extraction/text.py in build_analyzer(self)
444
445 elif self.analyzer == "word":
--> 446 stop_words = self.get_stop_words()
447 tokenize = self.build_tokenizer()
448 self._check_stop_words_consistency(stop_words, preprocess, tokenize)
/opt/conda/lib/python3.9/site-packages/sklearn/feature_extraction/text.py in get_stop_words(self)
366 A list of stop words.
367 """
--> 368 return _check_stop_list(self.stop_words)
369
370 def _check_stop_words_consistency(self, stop_words, preprocess, tokenize):
/opt/conda/lib/python3.9/site-packages/sklearn/feature_extraction/text.py in _check_stop_list(stop)
190 return None
191 else: # assume it's a collection
--> 192 return frozenset(stop)
193
194
TypeError: 'LazyCorpusLoader' object is not iterable
Can you guide me to resolve the issue?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论