谁能告诉我为什么我的管道错了?
我正在尝试建立一个管道,以便执行GridSearchCV以找到最佳参数。我已经将数据分为火车和验证,并具有以下代码:
cols = ['home_ownership', "purpose","addr_state", "application_type", "term"]
column_transformer = make_pipeline(
(OneHotEncoder(categories = cols)),
(OrdinalEncoder(categories = X["grade"])),
"passthrough")
imputer = SimpleImputer(strategy='median')
scaler = StandardScaler()
model = SGDClassifier(loss='log',random_state=42,n_jobs=-1,warm_start=True)
pipeline_sgdlogreg = make_pipeline(imputer, column_transformer, scaler, model)
当我执行GridSearchCV时,我会遇到Follwing错误:
“不能将中位数策略与非数字数据(...)一起使用,
我不明白为什么我为什么得到这个错误。任何分类变量都没有缺少值。
我完善了follwing:插补 - > coding-> scaling->任何
人都可以散发一些灯光吗?
I am trying to build a pipeline in order to perform GridSearchCV to find the best parameters. I already split the data into train and validation and have the following code:
cols = ['home_ownership', "purpose","addr_state", "application_type", "term"]
column_transformer = make_pipeline(
(OneHotEncoder(categories = cols)),
(OrdinalEncoder(categories = X["grade"])),
"passthrough")
imputer = SimpleImputer(strategy='median')
scaler = StandardScaler()
model = SGDClassifier(loss='log',random_state=42,n_jobs=-1,warm_start=True)
pipeline_sgdlogreg = make_pipeline(imputer, column_transformer, scaler, model)
When I perform GridSearchCV I am getting the follwing error:
"cannot use median strategy with non-numeric data (...)"
I do not understand why am I getting this error. None of the categorical variables have missing values.
I perfoming the follwing: Imputation->Encoding->Scaling-> Modeling
Can anyone shed some light?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
无论它是否缺少值,Sklearn看到了一个非数字列类型以及要求如何考虑它,就会抛出错误。它不在乎该值是否在进行检查时是否丢失,它只是知道它将无法处理给定策略下出现的任何事情并抛出异常。
它的意思是它所说的。如果您想在管道中使用自定义螺丝螺丝块,则需要创建一个自定义螺丝螺丝。这个问题并不完全相同,但是第二名答案概述了一种适合您的方法。
scikit-learn中的分类丢失值
Whether it has missing values or not, sklearn is throwing the error once it sees a non-numeric column type and how it's being asked to consider it. It doesn't care if values are missing or not when it's doing it's checks, it just knows it wouldn't be able to deal with any that do arise under the given strategy and throws the exception.
Means just what it says. You'll need to create a custom imputer if you want to use it in the pipeline. This question isn't quite the same, but the 2nd place answer outlines a method which will work for you.
Impute categorical missing values in scikit-learn