保持使用TFIDF制作的模型,用于使用Scikit进行Python预测新内容
这是用TF-IDF制作的情绪分析模型用于特征提取 我想知道如何保存此模型并重用它。 我尝试以这种方式保存它,但是当我加载它时,请在测试文本上进行相同的预处理,并在其上进行fit_transform,这给了一个错误,即模型预期x的功能数量,但得到了我
保存的方式
filename = "model.joblib"
joblib.dump(model, filename)
,这就是我的TF-IDF模型
import pandas as pd
import re
import nltk
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import BernoulliNB
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
nltk.download('stopwords')
from nltk.corpus import stopwords
processed_text = ['List of pre-processed text']
y = ['List of labels']
tfidfconverter = TfidfVectorizer(max_features=10000, min_df=5, max_df=0.7, stop_words=stopwords.words('english'))
X = tfidfconverter.fit_transform(processed_text).toarray()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
text_classifier = BernoulliNB()
text_classifier.fit(X_train, y_train)
predictions = text_classifier.predict(X_test)
print(confusion_matrix(y_test, predictions))
print(classification_report(y_test, predictions))
print(accuracy_score(y_test, predictions))
编辑的代码: 只是要确切地放在哪里 之后:然后
tfidfconverter = TfidfVectorizer(max_features=10000, min_df=5, max_df=0.7, stop_words=stopwords.words('english'))
,
tfidf_obj = tfidfconverter.fit(processed_text)//this is what will be used again
joblib.dump(tfidf_obj, 'tf-idf.joblib')
您也要在培训后也要保存分类器的其余步骤,以便:
text_classifier.fit(X_train, y_train)
放置 joblib.dump(型号,“ classifier.joblib”) 现在,当您想预测任何文本时
tf_idf_converter = joblib.load("tf-idf.joblib")
classifier = joblib.load("classifier.joblib")
,您都有句子列表以预测
sent = []
classifier.predict(tf_idf_converter.transform(sent))
现在打印该句子的列表
this is a sentiment analysis model made with tf-idf for feature extraction
i want to know how can i save this model and reuse it.
i tried saving it this way but when i load it , do same pre-processing on the test text and fit_transform on it it gave an error that the model expected X numbers of features but got Y
this is how i saved it
filename = "model.joblib"
joblib.dump(model, filename)
and this is the code for my tf-idf model
import pandas as pd
import re
import nltk
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import BernoulliNB
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
nltk.download('stopwords')
from nltk.corpus import stopwords
processed_text = ['List of pre-processed text']
y = ['List of labels']
tfidfconverter = TfidfVectorizer(max_features=10000, min_df=5, max_df=0.7, stop_words=stopwords.words('english'))
X = tfidfconverter.fit_transform(processed_text).toarray()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
text_classifier = BernoulliNB()
text_classifier.fit(X_train, y_train)
predictions = text_classifier.predict(X_test)
print(confusion_matrix(y_test, predictions))
print(classification_report(y_test, predictions))
print(accuracy_score(y_test, predictions))
edit:
just to exact where to put every line
so after:
tfidfconverter = TfidfVectorizer(max_features=10000, min_df=5, max_df=0.7, stop_words=stopwords.words('english'))
then
tfidf_obj = tfidfconverter.fit(processed_text)//this is what will be used again
joblib.dump(tfidf_obj, 'tf-idf.joblib')
then you do the rest of the steps you will save the classifier after training as well so after:
text_classifier.fit(X_train, y_train)
put
joblib.dump(model, "classifier.joblib")
now when you want to predict any text
tf_idf_converter = joblib.load("tf-idf.joblib")
classifier = joblib.load("classifier.joblib")
now u have List of sentences to predict
sent = []
classifier.predict(tf_idf_converter.transform(sent))
now print that for a list of sentiments for each sentece
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以首先使用以下方式将
TFIDF
拟合到培训集中:然后找到一种存储
tfidf_obj
的方法,例如使用pickle
或joblib < /code> eg:
然后加载保存的
tfidf_obj
,然后应用transform
仅在测试集上You can first fit
tfidf
to your training set using:Then find a way to store the
tfidf_obj
for instance usingpickle
orjoblib
e.g:Then load the saved
tfidf_obj
and applytransform
only on your test set