如何确保GridSearchCV首先要分开,然后将其插入?
我有一个GridSearchCV,带有一个看起来像这样的管道:
numeric_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='most_frequent')),
('scaler', StandardScaler())
])
preprocessor = ColumnTransformer(transformers=[
('num', numeric_transformer, numeric_features),
])
clf = Pipeline(steps=[
('preprocessor', preprocessor),
('classifier', LogisticRegression(solver='lbfgs'))
])
我的GridSearchCV看起来像这样:
search = GridSearchCV(clf, param_grid, cv = 5, scoring = "roc_auc",error_score=0.0)
使用交叉验证= 5
,我如何确保我先将数据拆分,然后最常见?
I have a GridSearchCV, with a pipeline that looks something like this:
numeric_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='most_frequent')),
('scaler', StandardScaler())
])
preprocessor = ColumnTransformer(transformers=[
('num', numeric_transformer, numeric_features),
])
clf = Pipeline(steps=[
('preprocessor', preprocessor),
('classifier', LogisticRegression(solver='lbfgs'))
])
my GridSearchCV looks like this:
search = GridSearchCV(clf, param_grid, cv = 5, scoring = "roc_auc",error_score=0.0)
with Cross Validation = 5
So, how do I ensure that I split the data first, and then impute in the most frequent?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
GridSearchCV将像这样大致运行:
您可以确定
simpleImputer
andstandardsCaler
将做.fit()
and code>和.transform()
每个折叠。GridSearchCV will run roughly like this:
You can be sure that
SimpleImputer
andStandardScaler
will do.fit()
and.transform()
for each fold.