在进行管道过程时,建立95%的解释差异
我正在使用标准鳞片
,pca
和随机森林
对某些数据进行分类。我想使用管道
方法,但是,我不知道如何让pipeline
知道我想要n_components
= 95%的解释。方差。如何设置代码以在Pipeline
环境中计算此数字。
这是代码:
from sklearn.pipeline import Pipeline
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
pipe = Pipeline([('scaler', StandardScaler()),
# ('pca', PCA(n_components=n_to_reach_95)),
('pca', PCA(n_components=15)),
('clf', RandomForestClassifier())])
# Declare a hyperparameter grid
parameter_space = {
'clf__n_estimators': [10,50,100],
'clf__criterion': ['gini', 'entropy'],
'clf__max_depth': np.linspace(10,50,11),
}
clf = GridSearchCV(pipe, parameter_space, cv = 5, scoring = "accuracy", verbose = True) # model
pipe.fit(X_train,y_train)
I am using a standard scaler
, PCA
and Random Forest
to classify some data. I wanted to use the pipeline
methodology, however, I do not know how to let the pipeline
know that I want the n_components
= 95% explained variance. How can I set up the code to calculate this number in the pipeline
environment.
Here is the code:
from sklearn.pipeline import Pipeline
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
pipe = Pipeline([('scaler', StandardScaler()),
# ('pca', PCA(n_components=n_to_reach_95)),
('pca', PCA(n_components=15)),
('clf', RandomForestClassifier())])
# Declare a hyperparameter grid
parameter_space = {
'clf__n_estimators': [10,50,100],
'clf__criterion': ['gini', 'entropy'],
'clf__max_depth': np.linspace(10,50,11),
}
clf = GridSearchCV(pipe, parameter_space, cv = 5, scoring = "accuracy", verbose = True) # model
pipe.fit(X_train,y_train)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
Sklearn支持符号如
n_components = 0.95
。sklearn supports notation like
n_components = 0.95
in fact.