因此,我了解的是标准标准()。fit_transform(x,y)
不会更改目标功能( y
)。同时,对于某些算法(例如基于权重或基于距离的算法),我们还需要扩展目标特征。
我的问题是,我们是否必须实现两个标准标准
,一个用于功能,另一个用于目标功能?我想我们也可以在将培训数据集分配到 x
和 y
之前使用它,但是想知道我们如何在部署中使用它,因为我们没有 y
。
# --- creating pipelines
transformer_x = make_pipeline(
SimpleImputer(strategy='constant'),
StandardScaler())
transformer_y = make_pipeline(
SimpleImputer(strategy='constant'),
StandardScaler())
# --- development
model.fit(transformer_x.fit_transform(X_train), transformer_y.fit_transform(y_train))
# ---
# sometime later in deployment
saved_model.predict(transformer_x.transform(new_data))
另外,作为一个问题,是否有任何条件我们可能不需要对基于重量/距离的算法进行标准化?
谢谢!
So what I've understand is that StandardScaler().fit_transform(X, y)
does not change the target feature (y
). Meanwhile, for some algorithms (such as weight-based or distance-based) we also need to scale the target feature.
My question is, do we have to implement two StandardScaler
, one for the features and another for the target feature? I imagine we can also use it before splitting the training dataset into X
and y
, but wonder how we might then use it on deployment, as we wouldn't have y
.
# --- creating pipelines
transformer_x = make_pipeline(
SimpleImputer(strategy='constant'),
StandardScaler())
transformer_y = make_pipeline(
SimpleImputer(strategy='constant'),
StandardScaler())
# --- development
model.fit(transformer_x.fit_transform(X_train), transformer_y.fit_transform(y_train))
# ---
# sometime later in deployment
saved_model.predict(transformer_x.transform(new_data))
Also as a side question, is there any condition where we might not need to do standardisation for weight/distance-based algorithms?
Thanks!
发布评论
评论(2)
标准标准
,一个用于功能,另一个用于目标功能?通常,不需要扩展目标功能。在某些情况下,唯一有益的情况是。检查此链接有关更多信息。
标准化总是对您的训练阶段的一般术语有益。如果您需要对参数的某种类型的解释,也许应该避免它。再一次,我在这里提供了一个有趣的链接。
StandardScaler
, one for the features and another for the target feature?In general it's not necessary to scale target features. The only case it may be beneficial is in some cases of NN. Check this link for further information related.
Standardization always is beneficial for your training phase in general terms. Maybe you should avoid it if you need some type of interpretation of the parameter. Again, here I provided an interesting link.
这样做的最正确的方法是将
使用scaler
与scaler一起使用transformer
和管道 - 与x
的scaleer一起使用x
的任何其他预处理)和最终回归剂 - as回归范围
。与您提供的额外相比,另外一个好处是:预测的目标将重新缩放回原始缩放。The most semantically correct way to do this is to use
TransformedTargetRegressor
with the scaler astransformer
and the pipeline--with the scaler forX
(and any other preprocessing forX
) and the final regressor--asregressor
. One additional perk compared to what you've provided: the predicted target will be re-scaled back to the original scaling.