RFECV不是选择功能
我正在使用
i已在循环中设置以下代码:
print(f'Selecting features, starting at {number_of_features}')
n_features_to_drop = int(number_of_features * features_to_drop)
selector = RFECV(estimator=self.model_object, min_features_to_select=number_of_features - n_features_to_drop, step = int(n_features_to_drop / 10), cv = 5, n_jobs = -1)
selector.fit(self.X, self.y)
self.X = selector.transform(self.X)
self.number_of_features = self.X.shape[1]
print(f'Selected {number_of_features} features')
这给出以下输出:
Selecting features, starting at 388
Selected 388 features
Selecting features, starting at 388
Selected 388 features
Selecting features, starting at 388
Selected 388 features
Selecting features, starting at 388
Selected 318 features
Selecting features, starting at 318
Selected 255 features
似乎有时会粘在一定数量的功能上,其中最初的功能数为388和即使在功能选择后仍保持388。如何解释这种行为?
I am using RFECV from scikit learn to select a bunch of features. However, sometimes it does completely ignore the feature selection/
I have set up the following code in a loop:
print(f'Selecting features, starting at {number_of_features}')
n_features_to_drop = int(number_of_features * features_to_drop)
selector = RFECV(estimator=self.model_object, min_features_to_select=number_of_features - n_features_to_drop, step = int(n_features_to_drop / 10), cv = 5, n_jobs = -1)
selector.fit(self.X, self.y)
self.X = selector.transform(self.X)
self.number_of_features = self.X.shape[1]
print(f'Selected {number_of_features} features')
This gives the following output:
Selecting features, starting at 388
Selected 388 features
Selecting features, starting at 388
Selected 388 features
Selecting features, starting at 388
Selected 388 features
Selecting features, starting at 388
Selected 318 features
Selecting features, starting at 318
Selected 255 features
It seems to sometime get stuck on a certain number of features, where the initial number of features is 388 and stays at 388 even after feature selection. How can this behavior be explained?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在我看来,它只是有时会认为保留所有功能是最好的。由于随机的交叉验证拆分,它可能会产生不同的结果。您可以检查
cv_results _
词典以比较,并使用固定的Random_state
创建一个CV分配器,如果您想保留相同的拆分(在模型对象,重复的重新设计不应产生效果,除了不同的步骤尺寸)。It looks to me like it just sometimes thinks keeping all the features is best. Rerunning it may give different results due to the randomized cross-validation splits. You may inspect the
cv_results_
dictionaries to compare, and create a CV splitter with a fixedrandom_state
if you'd like to preserve the same splits (together with a random state in the model object, repeated rerunning shouldn't have an effect except for the differing step sizes).