我有以下代码:
modelClf = AdaBoostRegressor(base_estimator=LinearRegression(), learning_rate=2, n_estimators=427, random_state=42)
modelClf.fit(X_train, y_train)
在尝试解释和改进结果时,我想了解该功能的重要性,但是我遇到了一个错误,说线性回归并没有真正做这种事情。
好吧,听起来很合理,所以我尝试使用.coef_,因为它应该用于线性回归,但是它在适当的位置与Adaboost回归器不相容。
是否有任何方法可以找到该特征重要性,或者当Adaboost在线性回归上使用时不可能是不可能的?
I have the following code:
modelClf = AdaBoostRegressor(base_estimator=LinearRegression(), learning_rate=2, n_estimators=427, random_state=42)
modelClf.fit(X_train, y_train)
While trying to interpret and improve the results, I wanted to see the feature importances, however I get an error saying that linear regressions don't really do that kind of thing.
Alright, sounds reasonable, so I tried using .coef_ since it should work for linear regressions, but it, in place, turned out incompatible with the adaboost regressor.
Is there any way to find the feature importances or is it impossible when adaboost it used on a linear regression?
发布评论
评论(2)
earjeed12137 建议使用
coefs_ /code>,尽管需要选择如何使负系数正常化。还有问题何时系数确实是重要性的良好代表(您至少应该首先扩展数据)。然后有何时自适应增强功能的问题首先有助于线性模型。
快速执行此操作的一种方法是修改
LinearRegress
类:Issue12137 suggests to add support for this using the
coefs_
, although a choice needs to be made how to normalize negative coefficients. There's also the question of when coefficients are really good representatives of importance (you should at least scale your data first). And then there's the question of when adaptive boosting helps a linear model in the first place.One way to do this quickly is to modify the
LinearRegression
class:使用以下代码检查,具有特征重要性的属性:
输出:您可以看到功能2更重要,因为Y不过是其中的一半(因为数据是以这种方式创建的)。
Checked with below code, there is an attribute for feature importance:
Output: You can see feature 2 is more important as Y is nothing but half of it (as the data is created in such way).