如何使用Make_pipeline使用Sklearn的标准缩放器？

发布于 2025-02-07 07:12:13 字数 789 浏览 1 评论 0原文

我曾经用来运行Sklearn的标准缩放器以下方式：

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler().fit(X_train)
scaled_X_train = scaler.transform(X_train)

其中x_train是一个数组，其中包含我的培训数据集中的功能。

然后，我可以使用相同的缩放器来扩展测试数据集中的功能x_test：

scaled_X_test = scaler.transform(X_test)

我知道我也可以使用Sklearn's make_pipeline：

from sklearn.pipeline import make_pipeline
clf = make_pipeline(preprocessing.StandardScaler(), RandomForestClassifier(n_estimators=100))

但是，然后我如何使用缩放器？，像我平常一样称呼模型， ie ：

clf.fit(X_train,y_train)

然后：

y_pred = clf.predict(X_test)

？

原文

I am used to running sklearn's standard scaler the following way:

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler().fit(X_train)
scaled_X_train = scaler.transform(X_train)

Where X_train is an array containing the features in my training dataset.

I may then use the same scaler to scale the features in my test dataset X_test:

scaled_X_test = scaler.transform(X_test)

I know that I may also "bake" the scaler in the model, using sklearn's make_pipeline:

from sklearn.pipeline import make_pipeline
clf = make_pipeline(preprocessing.StandardScaler(), RandomForestClassifier(n_estimators=100))

But then how do I use the scaler? Is it enough to call the model like I normally would, i.e.:

clf.fit(X_train,y_train)

And then:

y_pred = clf.predict(X_test)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

毁虫ゝ 2025-02-14 07:12:13

是的，这是正确的。
将预处理烘烤到管道中也是一个好主意，以避免独立缩放测试和训练数据集的常见陷阱。

调用clf.fit（x_train，y_train）时，管道将适合x_train上的scaleer，然后使用适合预处理测试数据集的缩放器。

请参阅“

我们建议使用管道，这使使用估算器链链转换变得更加容易，并减少了忘记转换的可能性。

因此，您不“使用”缩放器的事实是每个设计。

话虽如此，如果您出于某种原因想从管道中独立访问缩放器，例如检查其值，则可以这样做：

clf.fit(X_train,y_train)
# For example, get the first step of the pipeline steps[0]
# then get the actual scaler object [1]
clf.steps[0][1].scale_

Yes, that is correct.
It's also a good idea to bake the preprocessing into a pipeline, to avoid the common pitfall of scaling the test and training datasets independently.

When calling clf.fit(X_train,y_train), the pipeline will fit the Scaler on X_train, and subsequently use that fit to preprocess your test dataset.

See an example at the beginning of the "common pitfalls and recommended practices" documentation.

We recommend using a Pipeline, which makes it easier to chain transformations with estimators, and reduces the possibility of forgetting a transformation.

So the fact that you don't "use" the Scaler yourself is per design.

With that said, if you wanted for some reason to independently access the scaler from a pipeline, for example to check it's values, you could do so:

clf.fit(X_train,y_train)
# For example, get the first step of the pipeline steps[0]
# then get the actual scaler object [1]
clf.steps[0][1].scale_

回复收藏 0 原文

~没有更多了~

关于作者

傻比既视感

暂无简介

文章

26 人气

关注发私信

李珊平

文章 0 评论 0

关注

Quxin

文章 0 评论 0

关注

范无咎

文章 0 评论 0

关注

github_ZOJ2N8YxBm

文章 0 评论 0

关注

若言

文章 0 评论 0

关注

南…巷孤猫

文章 0 评论 0

友情链接

文江博客

如何使用Make_pipeline使用Sklearn的标准缩放器？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

李珊平

Quxin

范无咎

github_ZOJ2N8YxBm

若言

南…巷孤猫

友情链接

如何使用Make_pipeline使用Sklearn的标准缩放器？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

李珊平

Quxin

范无咎

github_ZOJ2N8YxBm

若言

南…巷孤猫

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。