KNN:如何反转看不见的编码标签?
我尝试使用 KNN 进行预测,但由于数据是浮点数,我需要对其进行编码,以便 scikitlearn 接受它。这是我的方法,效果很好。我可以训练和预测。但输出显然是经过编码的:
df = pd.read_csv('data.csv', index_col = 'date', parse_dates = True)
X = df.drop(["predictor_pct_chg"], axis=1).values
y = df["predictor_pct_chg"].values
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.2,
shuffle=False,
)
lab_enc = preprocessing.LabelEncoder()
training_scores_encoded = lab_enc.fit_transform(y_train)
print(training_scores_encoded)
print(utils.multiclass.type_of_target(y_train))
print(utils.multiclass.type_of_target(y_train.astype('int')))
print(utils.multiclass.type_of_target(training_scores_encoded))
knn = KNeighborsClassifier()
knn.fit(
X_train,
training_scores_encoded,
)
y_pred = knn.predict(X_test)
训练和预测工作正常,但现在我想绘制预测并将其与我的 y_test 进行比较:
y_pred = lab_enc.inverse_transform(y_pred)
plt.plot(y_test, color ='red', label = 'Actual')
plt.plot(y_pred, color ='blue', label = 'Prediction')
plt.xlabel('Time')
plt.ylabel('% Change')
plt.legend()
plt.show
现在 inverse_transform() 不起作用,因为 LabelEncoder 以前从未见过预测。那么我该如何逆转呢?我的意思是我也可以在 y_test 上使用 LabelEncoder,然后将其与 y_pred 进行比较。但这没有意义,因为我需要以实际单位进行有用的预测(此处:%)。否则我无法解释这些预测。
错误:
ValueError: y contains previously unseen labels:
I try to make a prediction with KNN, but since the data is float I need to encode it so that scikitlearn accepts it. This is my approach, which works fine. I can train and predict. But the output is obviously encoded:
df = pd.read_csv('data.csv', index_col = 'date', parse_dates = True)
X = df.drop(["predictor_pct_chg"], axis=1).values
y = df["predictor_pct_chg"].values
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.2,
shuffle=False,
)
lab_enc = preprocessing.LabelEncoder()
training_scores_encoded = lab_enc.fit_transform(y_train)
print(training_scores_encoded)
print(utils.multiclass.type_of_target(y_train))
print(utils.multiclass.type_of_target(y_train.astype('int')))
print(utils.multiclass.type_of_target(training_scores_encoded))
knn = KNeighborsClassifier()
knn.fit(
X_train,
training_scores_encoded,
)
y_pred = knn.predict(X_test)
Training and making a prediction works fine, but now I want to plot the prediction and compare it to my y_test:
y_pred = lab_enc.inverse_transform(y_pred)
plt.plot(y_test, color ='red', label = 'Actual')
plt.plot(y_pred, color ='blue', label = 'Prediction')
plt.xlabel('Time')
plt.ylabel('% Change')
plt.legend()
plt.show
Now the inverse_transform() does not work, because the LabelEncoder has never seen the prediction before. So how can I reverse it then? I mean I could use the LabelEncoder on the y_test as well and then compare that to the y_pred. But this doesnt make sense, since I need a useful prediction in the actual unit (here: %). Otherwise I cannot interpret the predictions.
Error:
ValueError: y contains previously unseen labels:
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我对这个 ValueError 的起源的猜测:
由于您尚未在 train_test_split 中按 y 对数据进行分层,因此 y_test 可能包含训练数据中不存在的标签。因此,尝试设置train_test_split参数
stratify = y
。有关详细说明,请参阅sklearn 用户指南的分层部分
My guess on the origin of this ValueError:
since you haven't stratified the data by y in train_test_split, y_test could contain labels which were not present in training data. So, try setting train_test_split parameter
stratify = y
.For detailed explanation see the Stratification section of sklearn User Guide