检查特征在Scikit-Learn Pipelines中的重要性
我使用 scikit-learn 定义了以下管道:
model_lg = Pipeline([("preprocessing", StandardScaler()), ("classifier", LogisticRegression())])
model_dt = Pipeline([("preprocessing", StandardScaler()), ("classifier", DecisionTreeClassifier())])
model_gb = Pipeline([("preprocessing", StandardScaler()), ("classifier", HistGradientBoostingClassifier())])
然后我使用交叉验证来评估每个模型的性能:
cv_results_lg = cross_validate(model_lg, data, target, cv=5, return_train_score=True, return_estimator=True)
cv_results_dt = cross_validate(model_dt, data, target, cv=5, return_train_score=True, return_estimator=True)
cv_results_gb = cross_validate(model_gb, data, target, cv=5, return_train_score=True, return_estimator=True)
当我尝试使用 coef_ 方法检查每个模型的特征重要性时,它给了我归因错误:
model_lg.steps[1][1].coef_
AttributeError: 'LogisticRegression' object has no attribute 'coef_'
model_dt.steps[1][1].coef_
AttributeError: 'DecisionTreeClassifier' object has no attribute 'coef_'
model_gb.steps[1][1].coef_
AttributeError: 'HistGradientBoostingClassifier' object has no attribute 'coef_'
我想知道如何修复此错误?或者是否有其他方法来检查每个模型中的特征重要性?
I have defined the following pipelines using scikit-learn:
model_lg = Pipeline([("preprocessing", StandardScaler()), ("classifier", LogisticRegression())])
model_dt = Pipeline([("preprocessing", StandardScaler()), ("classifier", DecisionTreeClassifier())])
model_gb = Pipeline([("preprocessing", StandardScaler()), ("classifier", HistGradientBoostingClassifier())])
Then I used cross validation to evaluate the performance of each model:
cv_results_lg = cross_validate(model_lg, data, target, cv=5, return_train_score=True, return_estimator=True)
cv_results_dt = cross_validate(model_dt, data, target, cv=5, return_train_score=True, return_estimator=True)
cv_results_gb = cross_validate(model_gb, data, target, cv=5, return_train_score=True, return_estimator=True)
When I try to inspect the feature importance for each model using the coef_
method, it gives me an attribution error:
model_lg.steps[1][1].coef_
AttributeError: 'LogisticRegression' object has no attribute 'coef_'
model_dt.steps[1][1].coef_
AttributeError: 'DecisionTreeClassifier' object has no attribute 'coef_'
model_gb.steps[1][1].coef_
AttributeError: 'HistGradientBoostingClassifier' object has no attribute 'coef_'
I was wondering, how I can fix this error? or is there any other approach to inspect the feature importance in each model?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
Imo,这里的要点如下。一方面,管道实例
model_lg
、model_dt
等没有显式拟合(您没有调用方法.fit ()
直接在它们上),这会阻止您尝试访问实例本身的coef_
属性。另一方面,通过使用参数
return_estimator=True
调用.cross_validate()
(仅在交叉中使用.cross_validate()
才可能实现) -验证方法),您可以为每个 cv 分割获取拟合估计器,但您应该通过字典cv_results_lg
访问它们,cv_results_dt
等(在'estimator'
键上)。 这是代码中的参考 这是一个示例:例如,这些将是在第一次折叠时计算的结果。
有关相关主题的有用见解可以在以下位置找到:
Imo, the point here is the following. On the one side, the pipeline instances
model_lg
,model_dt
etc. are not explicitely fitted (you're not calling method.fit()
on them directly) and this prevents you from trying to access thecoef_
attribute on the instances themselves.On the other side, by calling
.cross_validate()
with parameterreturn_estimator=True
(which is possible with.cross_validate()
only among the cross-validation methods), you can get the fitted estimators back for each cv split, but you should access them via your dictionariescv_results_lg
,cv_results_dt
etc (on the'estimator'
key). Here's the reference in the code and here's an example:These would be - for instance - the results computed on the first fold.
Useful insights on related topics might be found in:
在某种算法中进行循环并打印精度
make for loop in some algorithm and print accuracy