将校准分类器与管道一起使用的正确方法
我按如下方式训练模型:
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.1, random_state=random_state_split_data)
X_train, X_validation, y_train, y_validation = train_test_split(X_train, y_train, stratify=y_train, test_size=0.1, random_state=random_state_split_data)
under = RandomUnderSampler(sampling_strategy=0.2)
X_train,y_train = under.fit_resample(X_train,y_train)
#define pipeline
selector = RFE(estimator=RandomForestClassifier(), n_features_to_select=100)
numeric_transformer = Pipeline(steps=[('imputer',SimpleImputer(missing_values=np.nan,strategy='constant', fill_value=0))])
preprocessor = ColumnTransformer(transformers=[('num', numeric_transformer, numeric_cols)])
model = XGBClassifier(objective='binary:logistic',n_jobs=29,use_label_encoder=False,random_state = 42)
pipe = Pipeline(steps=[('preprocessor', preprocessor),('var',VarianceThreshold()),('sel',sel),('clf', model)])
然后我在此管道上进行网格搜索,
gridsearch = GridSearchCV(pipe, param_grid, cv=3, verbose=1,n_jobs=-1)
gridsearch.fit(X_train, y_train)
我的结果是:
best_est = gridsearch.best_estimator_
然后进行校准:
X_validation_calibrate = pd.DataFrame(best_est[:-1].transform(X_validation),columns=features_cols)
X_test_calibrate = pd.DataFrame(best_est[:-1].transform(X_test),columns=features_cols)
我通过校准传递这些,例如片段是
sig_clf = CalibratedClassifierCV(best_est['clf'], method="sigmoid", cv="prefit")
iso_clf = CalibratedClassifierCV(best_est['clf'], method="isotonic", cv="prefit")
sig_clf.fit(X_validation_calibrate, y_valid)
iso_clf.fit(X_validation_calibrate, y_valid)
我的 SIG_CLF 具有最佳校准,所以我想使用它而不是我的'best_est['clf']。因此上面的 sig_clf 只是采用模型而不是预处理。当我对其他数据集进行预测时,例如“newdata”,以下内容有意义吗?
test1 = best_est[:-1].transform(newdata)
predictions_new = sig_clf.predict_proba(test1)
上面我使用管道的每个部分来转换名为“newdata”的外部数据集,然后应用校准的 sigmoid模型到转换后的数据集上,以给出最终的校准预测。这是正确的吗?
I train a model as follows:
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.1, random_state=random_state_split_data)
X_train, X_validation, y_train, y_validation = train_test_split(X_train, y_train, stratify=y_train, test_size=0.1, random_state=random_state_split_data)
under = RandomUnderSampler(sampling_strategy=0.2)
X_train,y_train = under.fit_resample(X_train,y_train)
#define pipeline
selector = RFE(estimator=RandomForestClassifier(), n_features_to_select=100)
numeric_transformer = Pipeline(steps=[('imputer',SimpleImputer(missing_values=np.nan,strategy='constant', fill_value=0))])
preprocessor = ColumnTransformer(transformers=[('num', numeric_transformer, numeric_cols)])
model = XGBClassifier(objective='binary:logistic',n_jobs=29,use_label_encoder=False,random_state = 42)
pipe = Pipeline(steps=[('preprocessor', preprocessor),('var',VarianceThreshold()),('sel',sel),('clf', model)])
i then do a gridsearch on this pipeline
gridsearch = GridSearchCV(pipe, param_grid, cv=3, verbose=1,n_jobs=-1)
gridsearch.fit(X_train, y_train)
my result is:
best_est = gridsearch.best_estimator_
I then carry out calibration:
X_validation_calibrate = pd.DataFrame(best_est[:-1].transform(X_validation),columns=features_cols)
X_test_calibrate = pd.DataFrame(best_est[:-1].transform(X_test),columns=features_cols)
I pass these through the calibration e.g. a snippet is
sig_clf = CalibratedClassifierCV(best_est['clf'], method="sigmoid", cv="prefit")
iso_clf = CalibratedClassifierCV(best_est['clf'], method="isotonic", cv="prefit")
sig_clf.fit(X_validation_calibrate, y_valid)
iso_clf.fit(X_validation_calibrate, y_valid)
My SIG_CLF had the best calibration so i would like to use this rather than my 'best_est['clf']. Therefore the sig_clf above is just taking the model not preprocessing. When i come to make predictions on other datasets e.g. 'newdata' does the following make sense?
test1 = best_est[:-1].transform(newdata)
predictions_new = sig_clf.predict_proba(test1)
Above i am using every part of the pipeline to transform an external dataset called 'newdata' then i apply the calibrated sigmoid model onto the transformed dataset to give me final calibrated predictions. Is this correct?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论