将稀疏矩阵转换为数组时,用XGBoost获得不同的预测?
我有一个试图预测供应商类别的模型。
count_vect = CountVectorizer(ngram_range = (1,2), min_df = 5, binary = True)
X_train_counts = count_vect.fit_transform(X_train)
clf = XGBClassifier().fit(X_train_counts, y_train)
my_data = pd.Series(['ebay', 'amazon', 'costco', 'uber'])
sparse_count = count_vect.transform(my_data)
print(clf.predict(sparse_count))
print(clf.predict(sparse_count.toarray()))
这使输出
[19 16 1 18]
[20 20 20 20]
如此清晰地传递到数组会改变预测。看起来稀疏的预测是正确的,但是我想了解导致这一点的原因。这似乎与稀疏矩阵的训练有关,然后预测引起问题的阵列。
如果我首先将稀疏矩阵转换为阵列和火车(这需要更长的时间),如果我通过稀疏或数组,我会得到相同的预测。但是,当我在稀疏中训练时,如果我预测数组,我会得到奇怪的预测。这是为什么?
I have a model trying to predict the category of a vendor.
count_vect = CountVectorizer(ngram_range = (1,2), min_df = 5, binary = True)
X_train_counts = count_vect.fit_transform(X_train)
clf = XGBClassifier().fit(X_train_counts, y_train)
my_data = pd.Series(['ebay', 'amazon', 'costco', 'uber'])
sparse_count = count_vect.transform(my_data)
print(clf.predict(sparse_count))
print(clf.predict(sparse_count.toarray()))
This gives output
[19 16 1 18]
[20 20 20 20]
So clearly passing it as an array changes the prediction. It definitely looks like the sparse predictions are the correct ones, but I want to understand what is causing this. It seems to have something to do with training on a sparse matrix and then predicting on an array that is causing issues.
If I first convert the sparse matrix to an array and train (which takes much longer) I get the same predictions if I pass as sparse or array. But when I train on sparse, I get weird predictions if I predict on array. Why is that?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论