我们是否需要两个独立的标准标准来实现功能和目标?

发布于 2025-02-11 11:55:41 字数 880 浏览 1 评论 0 原文

因此,我了解的是标准标准()。fit_transform(x,y)不会更改目标功能( y )。同时,对于某些算法(例如基于权重或基于距离的算法),我们还需要扩展目标特征。

我的问题是,我们是否必须实现两个标准标准,一个用于功能,另一个用于目标功能?我想我们也可以在将培训数据集分配到 x y 之前使用它,但是想知道我们如何在部署中使用它,因为我们没有 y

# --- creating pipelines
transformer_x = make_pipeline(
    SimpleImputer(strategy='constant'), 
    StandardScaler())

transformer_y = make_pipeline(
    SimpleImputer(strategy='constant'), 
    StandardScaler())

# --- development
model.fit(transformer_x.fit_transform(X_train), transformer_y.fit_transform(y_train))

# --- 
# sometime later in deployment
saved_model.predict(transformer_x.transform(new_data))

另外,作为一个问题,是否有任何条件我们可能不需要对基于重量/距离的算法进行标准化?

谢谢!

So what I've understand is that StandardScaler().fit_transform(X, y) does not change the target feature (y). Meanwhile, for some algorithms (such as weight-based or distance-based) we also need to scale the target feature.

My question is, do we have to implement two StandardScaler, one for the features and another for the target feature? I imagine we can also use it before splitting the training dataset into X and y, but wonder how we might then use it on deployment, as we wouldn't have y.

# --- creating pipelines
transformer_x = make_pipeline(
    SimpleImputer(strategy='constant'), 
    StandardScaler())

transformer_y = make_pipeline(
    SimpleImputer(strategy='constant'), 
    StandardScaler())

# --- development
model.fit(transformer_x.fit_transform(X_train), transformer_y.fit_transform(y_train))

# --- 
# sometime later in deployment
saved_model.predict(transformer_x.transform(new_data))

Also as a side question, is there any condition where we might not need to do standardisation for weight/distance-based algorithms?

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

来日方长 2025-02-18 11:55:41
  • 我们是否必须实现两个标准标准,一个用于功能,另一个用于目标功能?

通常,不需要扩展目标功能。在某些情况下,唯一有益的情况是。检查此链接有关更多信息。

  • 也作为一个问题,是否有任何条件我们可能不需要对基于重量/距离的算法进行标准化?

标准化总是对您的训练阶段的一般术语有益。如果您需要对参数的某种类型的解释,也许应该避免它。再一次,我在这里提供了一个有趣的链接

  • Do we have to implement two StandardScaler, one for the features and another for the target feature?

In general it's not necessary to scale target features. The only case it may be beneficial is in some cases of NN. Check this link for further information related.

  • Also as a side question, is there any condition where we might not need to do standardization for weight/distance-based algorithms?

Standardization always is beneficial for your training phase in general terms. Maybe you should avoid it if you need some type of interpretation of the parameter. Again, here I provided an interesting link.

帅气称霸 2025-02-18 11:55:41

这样做的最正确的方法是将使用scaler 与scaler一起使用 transformer 和管道 - 与 x 的scaleer一起使用 x 的任何其他预处理)和最终回归剂 - as 回归范围。与您提供的额外相比,另外一个好处是:预测的目标将重新缩放回原始缩放。

The most semantically correct way to do this is to use TransformedTargetRegressor with the scaler as transformer and the pipeline--with the scaler for X (and any other preprocessing for X) and the final regressor--as regressor. One additional perk compared to what you've provided: the predicted target will be re-scaled back to the original scaling.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文