检查特征在Scikit-Learn Pipelines中的重要性

发布于 2025-01-19 00:32:57 字数 1165 浏览 0 评论 0原文

我使用 scikit-learn 定义了以下管道：

model_lg = Pipeline([("preprocessing", StandardScaler()), ("classifier", LogisticRegression())])
model_dt = Pipeline([("preprocessing", StandardScaler()), ("classifier", DecisionTreeClassifier())])
model_gb = Pipeline([("preprocessing", StandardScaler()), ("classifier", HistGradientBoostingClassifier())])

然后我使用交叉验证来评估每个模型的性能：

cv_results_lg = cross_validate(model_lg, data, target, cv=5, return_train_score=True, return_estimator=True)
cv_results_dt = cross_validate(model_dt, data, target, cv=5, return_train_score=True, return_estimator=True)
cv_results_gb = cross_validate(model_gb, data, target, cv=5, return_train_score=True, return_estimator=True)

当我尝试使用 coef_ 方法检查每个模型的特征重要性时，它给了我归因错误：

model_lg.steps[1][1].coef_
AttributeError: 'LogisticRegression' object has no attribute 'coef_'

model_dt.steps[1][1].coef_
AttributeError: 'DecisionTreeClassifier' object has no attribute 'coef_'

model_gb.steps[1][1].coef_
AttributeError: 'HistGradientBoostingClassifier' object has no attribute 'coef_'

我想知道如何修复此错误？或者是否有其他方法来检查每个模型中的特征重要性？

原文

I have defined the following pipelines using scikit-learn:

model_lg = Pipeline([("preprocessing", StandardScaler()), ("classifier", LogisticRegression())])
model_dt = Pipeline([("preprocessing", StandardScaler()), ("classifier", DecisionTreeClassifier())])
model_gb = Pipeline([("preprocessing", StandardScaler()), ("classifier", HistGradientBoostingClassifier())])

Then I used cross validation to evaluate the performance of each model:

cv_results_lg = cross_validate(model_lg, data, target, cv=5, return_train_score=True, return_estimator=True)
cv_results_dt = cross_validate(model_dt, data, target, cv=5, return_train_score=True, return_estimator=True)
cv_results_gb = cross_validate(model_gb, data, target, cv=5, return_train_score=True, return_estimator=True)

When I try to inspect the feature importance for each model using the coef_ method, it gives me an attribution error:

model_lg.steps[1][1].coef_
AttributeError: 'LogisticRegression' object has no attribute 'coef_'

model_dt.steps[1][1].coef_
AttributeError: 'DecisionTreeClassifier' object has no attribute 'coef_'

model_gb.steps[1][1].coef_
AttributeError: 'HistGradientBoostingClassifier' object has no attribute 'coef_'

I was wondering, how I can fix this error? or is there any other approach to inspect the feature importance in each model?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

染火枫林 2025-01-26 00:32:57

Imo，这里的要点如下。一方面，管道实例 model_lg、model_dt 等没有显式拟合（您没有调用方法 .fit () 直接在它们上），这会阻止您尝试访问实例本身的 coef_ 属性。

另一方面，通过使用参数 return_estimator=True 调用 .cross_validate() （仅在交叉中使用 .cross_validate() 才可能实现） -验证方法），您可以为每个 cv 分割获取拟合估计器，但您应该通过字典 cv_results_lg 访问它们， cv_results_dt 等（在'estimator' 键上）。这是代码中的参考这是一个示例：

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.model_selection import cross_validate

X, y = load_iris(return_X_y=True)

model_lg = Pipeline([("preprocessing", StandardScaler()), ("classifier", LogisticRegression())])

cv_results_lg = cross_validate(model_lg, X, y, cv=5, return_train_score=True, return_estimator=True)

例如，这些将是在第一次折叠时计算的结果。

cv_results_lg['estimator'][0].named_steps['classifier'].coef_

有关相关主题的有用见解可以在以下位置找到：

Imo, the point here is the following. On the one side, the pipeline instances model_lg, model_dt etc. are not explicitely fitted (you're not calling method .fit() on them directly) and this prevents you from trying to access the coef_ attribute on the instances themselves.

On the other side, by calling .cross_validate() with parameter return_estimator=True (which is possible with .cross_validate() only among the cross-validation methods), you can get the fitted estimators back for each cv split, but you should access them via your dictionaries cv_results_lg, cv_results_dt etc (on the 'estimator' key). Here's the reference in the code and here's an example:

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.model_selection import cross_validate

X, y = load_iris(return_X_y=True)

model_lg = Pipeline([("preprocessing", StandardScaler()), ("classifier", LogisticRegression())])

cv_results_lg = cross_validate(model_lg, X, y, cv=5, return_train_score=True, return_estimator=True)

These would be - for instance - the results computed on the first fold.

cv_results_lg['estimator'][0].named_steps['classifier'].coef_

Useful insights on related topics might be found in:

回复收藏 0 原文

终弃我 2025-01-26 00:32:57

在某种算法中进行循环并打印精度

回复收藏 0 原文

~没有更多了~

关于作者

夜无邪

暂无简介

文章

19450 人气

关注发私信

友情链接

文江博客

检查特征在Scikit-Learn Pipelines中的重要性

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

饮湿

明月

02

hs1283

风向决定发型

落花浅忆

友情链接

检查特征在Scikit-Learn Pipelines中的重要性

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

饮湿

明月

02

hs1283

风向决定发型

落花浅忆

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。