当前位置：文江博客话题详情

如何增量训练 nltk 分类器

发布于 2024-10-16 03:27:31 字数 254 浏览 5 评论 0原文

我正在开发一个项目，使用 python nltk 模块和 naivebayes 分类器对文本片段进行分类。我能够对语料库数据进行训练并对另一组数据进行分类，但希望在初始训练后将额外的训练信息输入到分类器中。

如果我没有记错的话，似乎没有办法做到这一点，因为 NaiveBayesClassifier.train 方法需要一组完整的训练数据。有没有办法在不输入原始特征集的情况下添加到训练数据中？

我愿意接受建议，包括其他可以随着时间的推移接受新训练数据的分类器。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

小瓶盖 2024-10-23 03:27:31

我知道有两个选项：

1）定期根据新数据重新训练分类器。您将在语料库中积累新的训练数据（已经包含原始训练数据），然后每隔几个小时重新训练并重新训练数据。重新加载分类器。这可能是最简单的解决方案。

2）外部化内部模型，然后手动更新。可以通过给它一个 label_prodist 和一个 feature_probdist 来直接创建 NaiveBayesClassifier。您可以单独创建它们，将它们传递到 NaiveBayesClassifier，然后在新数据出现时更新它们。分类器将立即使用这些新数据。您必须查看 train 方法以获取有关如何更新概率分布的详细信息。

回复收藏 0 原文

困倦 2024-10-23 03:27:31

我刚刚学习NLTK，如果我错了，请纠正我。这是使用 NLTK 的 Python 3 分支，这可能不兼容。

NaiveBayesClassifier 实例有一个 update() 方法，该方法似乎添加到训练数据中：

from textblob.classifiers import NaiveBayesClassifier

train = [
    ('training test totally tubular', 't'),
]

cl = NaiveBayesClassifier(train)
cl.update([('super speeding special sport', 's')])

print('t', cl.classify('tubular test'))
print('s', cl.classify('super special'))

这会打印出：

t t
s s

I'm just learning NLTK, so please correct me if I'm wrong. This is using the Python 3 branch of NLTK, which might be incompatible.

There is an update() method to the NaiveBayesClassifier instance, which appears to add to the training data:

from textblob.classifiers import NaiveBayesClassifier

train = [
    ('training test totally tubular', 't'),
]

cl = NaiveBayesClassifier(train)
cl.update([('super speeding special sport', 's')])

print('t', cl.classify('tubular test'))
print('s', cl.classify('super special'))

This prints out: