保持使用TFIDF制作的模型，用于使用Scikit进行Python预测新内容

发布于 2025-02-02 02:57:04 字数 1982 浏览 4 评论 0原文

这是用TF-IDF制作的情绪分析模型用于特征提取我想知道如何保存此模型并重用它。我尝试以这种方式保存它，但是当我加载它时，请在测试文本上进行相同的预处理，并在其上进行fit_transform，这给了一个错误，即模型预期x的功能数量，但得到了我

保存的方式

filename = "model.joblib"
joblib.dump(model, filename)

，这就是我的TF-IDF模型

import pandas as pd
import re
import nltk
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import BernoulliNB
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
nltk.download('stopwords')
from nltk.corpus import stopwords

processed_text = ['List of pre-processed text'] 
y = ['List of labels']
tfidfconverter = TfidfVectorizer(max_features=10000, min_df=5, max_df=0.7, stop_words=stopwords.words('english'))
X = tfidfconverter.fit_transform(processed_text).toarray()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

text_classifier = BernoulliNB()
text_classifier.fit(X_train, y_train)

predictions = text_classifier.predict(X_test)
print(confusion_matrix(y_test, predictions))
print(classification_report(y_test, predictions))
print(accuracy_score(y_test, predictions))

编辑的代码：只是要确切地放在哪里之后：然后

tfidfconverter = TfidfVectorizer(max_features=10000, min_df=5, max_df=0.7, stop_words=stopwords.words('english'))

，

tfidf_obj = tfidfconverter.fit(processed_text)//this is what will be used again
joblib.dump(tfidf_obj, 'tf-idf.joblib')

您也要在培训后也要保存分类器的其余步骤，以便：

text_classifier.fit(X_train, y_train)

放置 joblib.dump（型号，“ classifier.joblib”）现在，当您想预测任何文本时

tf_idf_converter = joblib.load("tf-idf.joblib")
classifier = joblib.load("classifier.joblib")

，您都有句子列表以预测

sent = []
classifier.predict(tf_idf_converter.transform(sent))

现在打印该句子的列表

原文

this is a sentiment analysis model made with tf-idf for feature extraction
i want to know how can i save this model and reuse it.
i tried saving it this way but when i load it , do same pre-processing on the test text and fit_transform on it it gave an error that the model expected X numbers of features but got Y

this is how i saved it

filename = "model.joblib"
joblib.dump(model, filename)

and this is the code for my tf-idf model

import pandas as pd
import re
import nltk
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import BernoulliNB
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
nltk.download('stopwords')
from nltk.corpus import stopwords

processed_text = ['List of pre-processed text'] 
y = ['List of labels']
tfidfconverter = TfidfVectorizer(max_features=10000, min_df=5, max_df=0.7, stop_words=stopwords.words('english'))
X = tfidfconverter.fit_transform(processed_text).toarray()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

text_classifier = BernoulliNB()
text_classifier.fit(X_train, y_train)

predictions = text_classifier.predict(X_test)
print(confusion_matrix(y_test, predictions))
print(classification_report(y_test, predictions))
print(accuracy_score(y_test, predictions))

edit:
just to exact where to put every line
so after:

tfidfconverter = TfidfVectorizer(max_features=10000, min_df=5, max_df=0.7, stop_words=stopwords.words('english'))

then

tfidf_obj = tfidfconverter.fit(processed_text)//this is what will be used again
joblib.dump(tfidf_obj, 'tf-idf.joblib')

then you do the rest of the steps you will save the classifier after training as well so after:

text_classifier.fit(X_train, y_train)

put
joblib.dump(model, "classifier.joblib")
now when you want to predict any text

tf_idf_converter = joblib.load("tf-idf.joblib")
classifier = joblib.load("classifier.joblib")

now u have List of sentences to predict

sent = []
classifier.predict(tf_idf_converter.transform(sent))

now print that for a list of sentiments for each sentece

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

凑诗 2025-02-09 02:57:04

您可以首先使用以下方式将TFIDF拟合到培训集中：

tfidfconverter = TfidfVectorizer(max_features=10000, min_df=5, max_df=0.7, stop_words=stopwords.words('english'))
tfidf_obj = tfidfconverter.fit(processed_text)

然后找到一种存储tfidf_obj的方法，例如使用pickle或joblib < /code> eg：

joblib.dump(tfidf_obj, filename)

然后加载保存的tfidf_obj，然后应用transform仅在测试集上

loaded_tfidf = joblib.load(filename)
test_new = loaded_tfidf.transform(X_test)

You can first fit tfidf to your training set using:

tfidfconverter = TfidfVectorizer(max_features=10000, min_df=5, max_df=0.7, stop_words=stopwords.words('english'))
tfidf_obj = tfidfconverter.fit(processed_text)

Then find a way to store the tfidf_obj for instance using pickle or joblib e.g:

joblib.dump(tfidf_obj, filename)

Then load the saved tfidf_obj and apply transform only on your test set

loaded_tfidf = joblib.load(filename)
test_new = loaded_tfidf.transform(X_test)

回复收藏 0 原文

~没有更多了~

关于作者

巡山小妖精

暂无简介

文章

614 人气

关注发私信

友情链接

文江博客

保持使用TFIDF制作的模型，用于使用Scikit进行Python预测新内容

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

保持使用TFIDF制作的模型，用于使用Scikit进行Python预测新内容

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。