如何使用Make_pipeline使用Sklearn的标准缩放器?

发布于 2025-02-07 07:12:13 字数 789 浏览 1 评论 0原文

我曾经用来运行Sklearn的标准缩放器以下方式:

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler().fit(X_train)
scaled_X_train = scaler.transform(X_train)

其中x_train是一个数组,其中包含我的培训数据集中的功能。

然后,我可以使用相同的缩放器来扩展测试数据集中的功能x_test

scaled_X_test = scaler.transform(X_test)

我知道我也可以使用Sklearn's make_pipeline

from sklearn.pipeline import make_pipeline
clf = make_pipeline(preprocessing.StandardScaler(), RandomForestClassifier(n_estimators=100))

但是,然后我如何使用缩放器?,像我平常一样称呼模型, ie

clf.fit(X_train,y_train)

然后:

y_pred = clf.predict(X_test)

I am used to running sklearn's standard scaler the following way:

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler().fit(X_train)
scaled_X_train = scaler.transform(X_train)

Where X_train is an array containing the features in my training dataset.

I may then use the same scaler to scale the features in my test dataset X_test:

scaled_X_test = scaler.transform(X_test)

I know that I may also "bake" the scaler in the model, using sklearn's make_pipeline:

from sklearn.pipeline import make_pipeline
clf = make_pipeline(preprocessing.StandardScaler(), RandomForestClassifier(n_estimators=100))

But then how do I use the scaler? Is it enough to call the model like I normally would, i.e.:

clf.fit(X_train,y_train)

And then:

y_pred = clf.predict(X_test)

?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

毁虫ゝ 2025-02-14 07:12:13

是的,这是正确的。
将预处理烘烤到管道中也是一个好主意,以避免独立缩放测试和训练数据集的常见陷阱。

调用clf.fit(x_train,y_train)时,管道将适合x_train上的scaleer,然后使用适合预处理测试数据集的缩放器。

请参阅“

我们建议使用管道,这使使用估算器链链转换变得更加容易,并减少了忘记转换的可能性。

因此,您不“使用”缩放器的事实是每个设计。

话虽如此,如果您出于某种原因想从管道中独立访问缩放器,例如检查其值,则可以这样做:

clf.fit(X_train,y_train)
# For example, get the first step of the pipeline steps[0]
# then get the actual scaler object [1]
clf.steps[0][1].scale_

Yes, that is correct.
It's also a good idea to bake the preprocessing into a pipeline, to avoid the common pitfall of scaling the test and training datasets independently.

When calling clf.fit(X_train,y_train), the pipeline will fit the Scaler on X_train, and subsequently use that fit to preprocess your test dataset.

See an example at the beginning of the "common pitfalls and recommended practices" documentation.

We recommend using a Pipeline, which makes it easier to chain transformations with estimators, and reduces the possibility of forgetting a transformation.

So the fact that you don't "use" the Scaler yourself is per design.

With that said, if you wanted for some reason to independently access the scaler from a pipeline, for example to check it's values, you could do so:

clf.fit(X_train,y_train)
# For example, get the first step of the pipeline steps[0]
# then get the actual scaler object [1]
clf.steps[0][1].scale_
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文