如何使用朴素贝叶斯实现 TF_IDF 特征加权
我正在尝试实现朴素贝叶斯分类器进行情感分析。我打算使用 TF-IDF 加权度量。我现在有点卡住了。 NB一般使用词(特征)频率来寻找最大似然。那么如何在朴素贝叶斯中引入 TF-IDF 权重度量呢?
I'm trying to implement the naive Bayes classifier for sentiment analysis. I plan to use the TF-IDF weighting measure. I'm just a little stuck now. NB generally uses the word(feature) frequency to find the maximum likelihood. So how do I introduce the TF-IDF weighting measure in naive Bayes?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以使用 TF-IDF 权重作为统计模型中的特征/预测变量。我建议使用 gensim [1] 或 scikit-learn [2] 来计算权重,然后将其传递给朴素贝叶斯拟合程序。
scikit-learn“使用文本”教程 [3] 也可能令人感兴趣。
[1] http://scikit-learn.org/ dev/modules/ generated/sklearn.feature_extraction.text.TfidfTransformer.html
[2] http://radimrehurek.com/gensim/models/tfidfmodel.html
[3] http://scikit-learn.github.io/scikit-learn-tutorial/working_with_text_data.html
You use the TF-IDF weights as features/predictors in your statistical model. I suggest to use either gensim [1]or scikit-learn [2] to compute the weights, which you then pass to your Naive Bayes fitting procedure.
The scikit-learn 'working with text' tutorial [3] might also be of interest.
[1] http://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfTransformer.html
[2] http://radimrehurek.com/gensim/models/tfidfmodel.html
[3] http://scikit-learn.github.io/scikit-learn-tutorial/working_with_text_data.html