在进行管道过程时，建立95％的解释差异

发布于 2025-02-12 21:18:15 字数 900 浏览 1 评论 0原文

我正在使用标准鳞片，pca和随机森林对某些数据进行分类。我想使用管道方法，但是，我不知道如何让pipeline知道我想要n_components = 95％的解释。方差。如何设置代码以在Pipeline环境中计算此数字。

这是代码：

from sklearn.pipeline import Pipeline
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier

pipe = Pipeline([('scaler', StandardScaler()),
  #  ('pca', PCA(n_components=n_to_reach_95)),
('pca', PCA(n_components=15)),
 ('clf', RandomForestClassifier())])

# Declare a hyperparameter grid
parameter_space = {
    'clf__n_estimators': [10,50,100],
    'clf__criterion': ['gini', 'entropy'],
    'clf__max_depth': np.linspace(10,50,11),
}

clf = GridSearchCV(pipe, parameter_space, cv = 5, scoring = "accuracy", verbose = True) # model


pipe.fit(X_train,y_train)

原文

I am using a standard scaler, PCA and Random Forest to classify some data. I wanted to use the pipeline methodology, however, I do not know how to let the pipeline know that I want the n_components = 95% explained variance. How can I set up the code to calculate this number in the pipeline environment.

Here is the code:

from sklearn.pipeline import Pipeline
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier

pipe = Pipeline([('scaler', StandardScaler()),
  #  ('pca', PCA(n_components=n_to_reach_95)),
('pca', PCA(n_components=15)),
 ('clf', RandomForestClassifier())])

# Declare a hyperparameter grid
parameter_space = {
    'clf__n_estimators': [10,50,100],
    'clf__criterion': ['gini', 'entropy'],
    'clf__max_depth': np.linspace(10,50,11),
}

clf = GridSearchCV(pipe, parameter_space, cv = 5, scoring = "accuracy", verbose = True) # model


pipe.fit(X_train,y_train)

分享到QQ

分享到微博