来自Scikit-Learn feature_selection的rfe与statsmodels的负面曲线为估计器

发布于 2025-02-12 17:08:52 字数 1516 浏览 1 评论 0 原文

我正在尝试使用 rfe 从statsmodels noreflow noreferrer“>负面biNagemial”>负面弱。

因此,我创建了我自己的类

from sklearn.datasets import make_friedman1
from sklearn.feature_selection import RFE
from sklearn.base import BaseEstimator
import statsmodels.api as sm

class MyEstimator(BaseEstimator):
    def __init__(self, formula_, data_, family_):
        self.model = sm.formula.glm(formula, data=data_, family=family_)

    def fit(self, **kwargs):
        self.model.fit()
        self.coef_ = self.model.params.values

    def predict(self, X):
        result = self.model.predict(X)    
        return np.array(result)

X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)


dataset = pd.DataFrame({'X1':X[:,0], 'X2':X[:,1], 'X3':X[:,2], 'y':y})

estimator = MyEstimator("y ~ X1 + X2 + X3", dataset, sm.families.NegativeBinomial())

selector = RFE(estimator, n_features_to_select=5, step=1)
selector = selector.fit()

但是我遇到了这个错误:

TypeError: fit() missing 2 required positional arguments: 'X' and 'y'

有人有一个想法吗?

I'm trying to use RFE from scikit-learn with an estimator from statsmodels NegativeBinomial.

So I created my own class:

from sklearn.datasets import make_friedman1
from sklearn.feature_selection import RFE
from sklearn.base import BaseEstimator
import statsmodels.api as sm

class MyEstimator(BaseEstimator):
    def __init__(self, formula_, data_, family_):
        self.model = sm.formula.glm(formula, data=data_, family=family_)

    def fit(self, **kwargs):
        self.model.fit()
        self.coef_ = self.model.params.values

    def predict(self, X):
        result = self.model.predict(X)    
        return np.array(result)

X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)


dataset = pd.DataFrame({'X1':X[:,0], 'X2':X[:,1], 'X3':X[:,2], 'y':y})

estimator = MyEstimator("y ~ X1 + X2 + X3", dataset, sm.families.NegativeBinomial())

selector = RFE(estimator, n_features_to_select=5, step=1)
selector = selector.fit()

But I get this error:

TypeError: fit() missing 2 required positional arguments: 'X' and 'y'

Does someone has an idea?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

庆幸我还是我 2025-02-19 17:08:52

您可以修改代码以要求 endog exog 变量,而不是使用公式 api:

import numpy as np
import pandas as pd
from sklearn.datasets import make_friedman1
from sklearn.feature_selection import RFE
from sklearn.base import BaseEstimator
import statsmodels.api as sm

class MyEstimator(BaseEstimator):
    def __init__(self, family_):
        self.family_ = family_

    def fit(self, exog, endog):
        self.model = sm.GLM(endog, exog, family=self.family_)
        fit_results = self.model.fit()
        self.coef_ = fit_results.params

    def predict(self, X):
        result = self.model.predict(X)    
        return np.array(result)

X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)

estimator = MyEstimator(sm.families.NegativeBinomial())

selector = RFE(estimator, n_features_to_select=5, step=1)
selector = selector.fit(X, y.reshape(-1,1))
print(selector.ranking_)
# [1 1 3 1 1 5 1 6 4 2]

You can modify your code to require endog and exog variables, instead of using the formula API:

import numpy as np
import pandas as pd
from sklearn.datasets import make_friedman1
from sklearn.feature_selection import RFE
from sklearn.base import BaseEstimator
import statsmodels.api as sm

class MyEstimator(BaseEstimator):
    def __init__(self, family_):
        self.family_ = family_

    def fit(self, exog, endog):
        self.model = sm.GLM(endog, exog, family=self.family_)
        fit_results = self.model.fit()
        self.coef_ = fit_results.params

    def predict(self, X):
        result = self.model.predict(X)    
        return np.array(result)

X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)

estimator = MyEstimator(sm.families.NegativeBinomial())

selector = RFE(estimator, n_features_to_select=5, step=1)
selector = selector.fit(X, y.reshape(-1,1))
print(selector.ranking_)
# [1 1 3 1 1 5 1 6 4 2]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文