RFECV不是选择功能

发布于 2025-02-10 03:22:35 字数 910 浏览 2 评论 0原文

我正在使用

i已在循环中设置以下代码:

print(f'Selecting features, starting at {number_of_features}')
n_features_to_drop = int(number_of_features  * features_to_drop)

selector = RFECV(estimator=self.model_object, min_features_to_select=number_of_features - n_features_to_drop, step = int(n_features_to_drop / 10), cv = 5, n_jobs = -1)

selector.fit(self.X, self.y)
self.X = selector.transform(self.X)

self.number_of_features = self.X.shape[1]
print(f'Selected {number_of_features} features')

这给出以下输出:

Selecting features, starting at 388
Selected 388 features

Selecting features, starting at 388
Selected 388 features

Selecting features, starting at 388
Selected 388 features

Selecting features, starting at 388
Selected 318 features

Selecting features, starting at 318
Selected 255 features

似乎有时会粘在一定数量的功能上,其中最初的功能数为388和即使在功能选择后仍保持388。如何解释这种行为?

I am using RFECV from scikit learn to select a bunch of features. However, sometimes it does completely ignore the feature selection/

I have set up the following code in a loop:

print(f'Selecting features, starting at {number_of_features}')
n_features_to_drop = int(number_of_features  * features_to_drop)

selector = RFECV(estimator=self.model_object, min_features_to_select=number_of_features - n_features_to_drop, step = int(n_features_to_drop / 10), cv = 5, n_jobs = -1)

selector.fit(self.X, self.y)
self.X = selector.transform(self.X)

self.number_of_features = self.X.shape[1]
print(f'Selected {number_of_features} features')

This gives the following output:

Selecting features, starting at 388
Selected 388 features

Selecting features, starting at 388
Selected 388 features

Selecting features, starting at 388
Selected 388 features

Selecting features, starting at 388
Selected 318 features

Selecting features, starting at 318
Selected 255 features

It seems to sometime get stuck on a certain number of features, where the initial number of features is 388 and stays at 388 even after feature selection. How can this behavior be explained?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

执手闯天涯 2025-02-17 03:22:35

在我看来,它只是有时会认为保留所有功能是最好的。由于随机的交叉验证拆分,它可能会产生不同的结果。您可以检查cv_results _词典以比较,并使用固定的Random_state创建一个CV分配器,如果您想保留相同的拆分(在模型对象,重复的重新设计不应产生效果,除了不同的步骤尺寸)。

It looks to me like it just sometimes thinks keeping all the features is best. Rerunning it may give different results due to the randomized cross-validation splits. You may inspect the cv_results_ dictionaries to compare, and create a CV splitter with a fixed random_state if you'd like to preserve the same splits (together with a random state in the model object, repeated rerunning shouldn't have an effect except for the differing step sizes).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文