如何使用Make_pipeline使用Sklearn的标准缩放器?
我曾经用来运行Sklearn的标准缩放器以下方式:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler().fit(X_train)
scaled_X_train = scaler.transform(X_train)
其中x_train
是一个数组,其中包含我的培训数据集中的功能。
然后,我可以使用相同的缩放器来扩展测试数据集中的功能x_test
:
scaled_X_test = scaler.transform(X_test)
我知道我也可以使用Sklearn's make_pipeline
:
from sklearn.pipeline import make_pipeline
clf = make_pipeline(preprocessing.StandardScaler(), RandomForestClassifier(n_estimators=100))
但是,然后我如何使用缩放器?,像我平常一样称呼模型, ie :
clf.fit(X_train,y_train)
然后:
y_pred = clf.predict(X_test)
?
I am used to running sklearn's standard scaler the following way:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler().fit(X_train)
scaled_X_train = scaler.transform(X_train)
Where X_train
is an array containing the features in my training dataset.
I may then use the same scaler to scale the features in my test dataset X_test
:
scaled_X_test = scaler.transform(X_test)
I know that I may also "bake" the scaler in the model, using sklearn's make_pipeline
:
from sklearn.pipeline import make_pipeline
clf = make_pipeline(preprocessing.StandardScaler(), RandomForestClassifier(n_estimators=100))
But then how do I use the scaler? Is it enough to call the model like I normally would, i.e.:
clf.fit(X_train,y_train)
And then:
y_pred = clf.predict(X_test)
?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
是的,这是正确的。
将预处理烘烤到管道中也是一个好主意,以避免独立缩放测试和训练数据集的常见陷阱。
调用
clf.fit(x_train,y_train)
时,管道将适合x_train上的scaleer,然后使用适合预处理测试数据集的缩放器。请参阅“
因此,您不“使用”缩放器的事实是每个设计。
话虽如此,如果您出于某种原因想从管道中独立访问缩放器,例如检查其值,则可以这样做:
Yes, that is correct.
It's also a good idea to bake the preprocessing into a pipeline, to avoid the common pitfall of scaling the test and training datasets independently.
When calling
clf.fit(X_train,y_train)
, the pipeline will fit the Scaler on X_train, and subsequently use that fit to preprocess your test dataset.See an example at the beginning of the "common pitfalls and recommended practices" documentation.
So the fact that you don't "use" the Scaler yourself is per design.
With that said, if you wanted for some reason to independently access the scaler from a pipeline, for example to check it's values, you could do so: