从Adaboost的线性回归中获得特征的重要性

发布于 2025-02-10 09:45:24 字数 370 浏览 1 评论 0 原文

我有以下代码:

modelClf = AdaBoostRegressor(base_estimator=LinearRegression(), learning_rate=2, n_estimators=427, random_state=42)

modelClf.fit(X_train, y_train)

在尝试解释和改进结果时,我想了解该功能的重要性,但是我遇到了一个错误,说线性回归并没有真正做这种事情。

好吧,听起来很合理,所以我尝试使用.coef_,因为它应该用于线性回归,但是它在适当的位置与Adaboost回归器不相容。

是否有任何方法可以找到该特征重要性,或者当Adaboost在线性回归上使用时不可能是不可能的?

I have the following code:

modelClf = AdaBoostRegressor(base_estimator=LinearRegression(), learning_rate=2, n_estimators=427, random_state=42)

modelClf.fit(X_train, y_train)

While trying to interpret and improve the results, I wanted to see the feature importances, however I get an error saying that linear regressions don't really do that kind of thing.

Alright, sounds reasonable, so I tried using .coef_ since it should work for linear regressions, but it, in place, turned out incompatible with the adaboost regressor.

Is there any way to find the feature importances or is it impossible when adaboost it used on a linear regression?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

别念他 2025-02-17 09:45:24

earjeed12137 建议使用 coefs_ /code>,尽管需要选择如何使负系数正常化。还有问题何时系数确实是重要性的良好代表(您至少应该首先扩展数据)。然后有何时自适应增强功能的问题首先有助于线性模型。

快速执行此操作的一种方法是修改 LinearRegress 类:

class MyLinReg(LinearRegression):
    @property
    def feature_importances_(self):
        return self.coef_  # assuming one output

modelClf = AdaBoostRegressor(base_estimator=MyLinReg(), ...)

Issue12137 suggests to add support for this using the coefs_, although a choice needs to be made how to normalize negative coefficients. There's also the question of when coefficients are really good representatives of importance (you should at least scale your data first). And then there's the question of when adaptive boosting helps a linear model in the first place.

One way to do this quickly is to modify the LinearRegression class:

class MyLinReg(LinearRegression):
    @property
    def feature_importances_(self):
        return self.coef_  # assuming one output

modelClf = AdaBoostRegressor(base_estimator=MyLinReg(), ...)
指尖上的星空 2025-02-17 09:45:24

使用以下代码检查,具有特征重要性的属性:

import pandas as pd
import random 
from sklearn.ensemble import AdaBoostRegressor

df = pd.DataFrame({'x1':random.choices(range(0, 100), k=10), 'x2':random.choices(range(0, 100), k=10)})

df['y'] = df['x2'] * .5

X = df[['x1','x2']].values
y = df['y'].values

regr = AdaBoostRegressor(random_state=0, n_estimators=100)
regr.fit(X, y)

regr.feature_importances_

输出:您可以看到功能2更重要,因为Y不过是其中的一半(因为数据是以这种方式创建的)。

Checked with below code, there is an attribute for feature importance:

import pandas as pd
import random 
from sklearn.ensemble import AdaBoostRegressor

df = pd.DataFrame({'x1':random.choices(range(0, 100), k=10), 'x2':random.choices(range(0, 100), k=10)})

df['y'] = df['x2'] * .5

X = df[['x1','x2']].values
y = df['y'].values

regr = AdaBoostRegressor(random_state=0, n_estimators=100)
regr.fit(X, y)

regr.feature_importances_

Output: You can see feature 2 is more important as Y is nothing but half of it (as the data is created in such way).

enter image description here

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文