保持使用TFIDF制作的模型,用于使用Scikit进行Python预测新内容

发布于 2025-02-02 02:57:04 字数 1982 浏览 4 评论 0原文

这是用TF-IDF制作的情绪分析模型用于特征提取 我想知道如何保存此模型并重用它。 我尝试以这种方式保存它,但是当我加载它时,请在测试文本上进行相同的预处理,并在其上进行fit_transform,这给了一个错误,即模型预期x的功能数量,但得到了我

保存的方式

filename = "model.joblib"
joblib.dump(model, filename)

,这就是我的TF-IDF模型

import pandas as pd
import re
import nltk
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import BernoulliNB
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
nltk.download('stopwords')
from nltk.corpus import stopwords

processed_text = ['List of pre-processed text'] 
y = ['List of labels']
tfidfconverter = TfidfVectorizer(max_features=10000, min_df=5, max_df=0.7, stop_words=stopwords.words('english'))
X = tfidfconverter.fit_transform(processed_text).toarray()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

text_classifier = BernoulliNB()
text_classifier.fit(X_train, y_train)

predictions = text_classifier.predict(X_test)
print(confusion_matrix(y_test, predictions))
print(classification_report(y_test, predictions))
print(accuracy_score(y_test, predictions))

编辑的代码: 只是要确切地放在哪里 之后:然后

tfidfconverter = TfidfVectorizer(max_features=10000, min_df=5, max_df=0.7, stop_words=stopwords.words('english'))

tfidf_obj = tfidfconverter.fit(processed_text)//this is what will be used again
joblib.dump(tfidf_obj, 'tf-idf.joblib')

您也要在培训后也要保存分类器的其余步骤,以便:

text_classifier.fit(X_train, y_train)

放置 joblib.dump(型号,“ classifier.joblib”) 现在,当您想预测任何文本时

tf_idf_converter = joblib.load("tf-idf.joblib")
classifier = joblib.load("classifier.joblib")

,您都有句子列表以预测

sent = []
classifier.predict(tf_idf_converter.transform(sent))

现在打印该句子的列表

this is a sentiment analysis model made with tf-idf for feature extraction
i want to know how can i save this model and reuse it.
i tried saving it this way but when i load it , do same pre-processing on the test text and fit_transform on it it gave an error that the model expected X numbers of features but got Y

this is how i saved it

filename = "model.joblib"
joblib.dump(model, filename)

and this is the code for my tf-idf model

import pandas as pd
import re
import nltk
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import BernoulliNB
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
nltk.download('stopwords')
from nltk.corpus import stopwords

processed_text = ['List of pre-processed text'] 
y = ['List of labels']
tfidfconverter = TfidfVectorizer(max_features=10000, min_df=5, max_df=0.7, stop_words=stopwords.words('english'))
X = tfidfconverter.fit_transform(processed_text).toarray()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

text_classifier = BernoulliNB()
text_classifier.fit(X_train, y_train)

predictions = text_classifier.predict(X_test)
print(confusion_matrix(y_test, predictions))
print(classification_report(y_test, predictions))
print(accuracy_score(y_test, predictions))

edit:
just to exact where to put every line
so after:

tfidfconverter = TfidfVectorizer(max_features=10000, min_df=5, max_df=0.7, stop_words=stopwords.words('english'))

then

tfidf_obj = tfidfconverter.fit(processed_text)//this is what will be used again
joblib.dump(tfidf_obj, 'tf-idf.joblib')

then you do the rest of the steps you will save the classifier after training as well so after:

text_classifier.fit(X_train, y_train)

put
joblib.dump(model, "classifier.joblib")
now when you want to predict any text

tf_idf_converter = joblib.load("tf-idf.joblib")
classifier = joblib.load("classifier.joblib")

now u have List of sentences to predict

sent = []
classifier.predict(tf_idf_converter.transform(sent))

now print that for a list of sentiments for each sentece

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

凑诗 2025-02-09 02:57:04

您可以首先使用以下方式将TFIDF拟合到培训集中:

tfidfconverter = TfidfVectorizer(max_features=10000, min_df=5, max_df=0.7, stop_words=stopwords.words('english'))
tfidf_obj = tfidfconverter.fit(processed_text)

然后找到一种存储tfidf_obj的方法,例如使用picklejoblib < /code> eg:

joblib.dump(tfidf_obj, filename)

然后加载保存的tfidf_obj,然后应用transform仅在测试集上

loaded_tfidf = joblib.load(filename)
test_new = loaded_tfidf.transform(X_test)

You can first fit tfidf to your training set using:

tfidfconverter = TfidfVectorizer(max_features=10000, min_df=5, max_df=0.7, stop_words=stopwords.words('english'))
tfidf_obj = tfidfconverter.fit(processed_text)

Then find a way to store the tfidf_obj for instance using pickle or joblib e.g:

joblib.dump(tfidf_obj, filename)

Then load the saved tfidf_obj and apply transform only on your test set

loaded_tfidf = joblib.load(filename)
test_new = loaded_tfidf.transform(X_test)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文