返回时 Sklearn 缩放器属性发生变化:X 有 6 个特征,但 MinMaxScaler 期望 1 个特征作为输入
我最近在 Sklearn 中遇到了一个令人困惑的问题,即使绕过它并不难,我也想了解到底发生了什么。
所以问题是,当我尝试使用函数返回的缩放器对象转换任何数据集时,我收到错误 ValueError: X has 6 features, but MinMaxScaler is waiting 1 feature as input.
之前在具有 6 个特征的数据集上对其进行了拟合 (fit_transform)。
如果我尝试在该函数内部和任何数据集上使用缩放器,如果给定数据集有 6 个特征,它要么运行正常,要么引发错误 ValueError: X has Y features, but MinMaxScaler is Expecting 6 features as input。 ,Y 是输入的特征编号。
因此,似乎只要我返回对象,n_features_in
属性就会设置为 1。其他属性不会更改,如果我手动将 n_features_in
属性设置为6 一切看起来都很顺利。
所以问题是:到底发生了什么?
编辑:进行了一些测试,其他属性确实发生了变化。 data_max_ 属性通常是 len 6 的列表,变为 len 1 的列表,包含训练集第一个特征的最大值。
这是一个简化的代码片段,有助于理解代码结构:
def construct(dataset, scaler_type) :
scaler = scaler_type
scaled_data = scaler.fit_transform(dataset)
print(scaler.n_features_in_) #Prints 6
scaler.transform([[1,2,3,4,5,6]]) #Works fine
return scaled_data, scaler
scaled_data, scaler = construct(dataset, MinMaxScaler())
print(scaler.n_features_in_) #Prints 1
scaler.transform([[1,2,3,4,5,6]]) #Raises error : ValueError: X has 6 features, but MinMaxScaler is expecting 1 features as input.
I recently encountered a puzzling problem with Sklearn, and even if it's not that hard to bypass it, I'd like to understand what's going on.
So the problem is that I get the error ValueError: X has 6 features, but MinMaxScaler is expecting 1 feature as input.
when I try to transform any dataset with a scaler object returned by a function in which I previously fitted it (fit_transform) on a dataset with 6 features.
If I try to use the scaler inside that function and on any dataset, it either runs fine if the given dataset has 6 features, or raises the error ValueError: X has Y features, but MinMaxScaler is expecting 6 features as input.
, with Y being the features number from the input.
So it seems that as soon as I return the object, the n_features_in
attribute is set to 1. The other attributes don't change and if I manually set the n_features_in
attribute to 6 everything seems to work fine.
So the question is: what is going on?
Edit: Did a little bit of testing and the other attributes do change. The data_max_ attribute which normally is a list of len 6 becomes a list of len 1, containing the max value of the first feature of the train set.
Here is a simplified code snippet to help understand the code structure :
def construct(dataset, scaler_type) :
scaler = scaler_type
scaled_data = scaler.fit_transform(dataset)
print(scaler.n_features_in_) #Prints 6
scaler.transform([[1,2,3,4,5,6]]) #Works fine
return scaled_data, scaler
scaled_data, scaler = construct(dataset, MinMaxScaler())
print(scaler.n_features_in_) #Prints 1
scaler.transform([[1,2,3,4,5,6]]) #Raises error : ValueError: X has 6 features, but MinMaxScaler is expecting 1 features as input.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
解决方案
我解决了这个问题,希望(尽管我怀疑很多人会遇到同样的问题)该解决方案对其他人有用。
除非有超人的直觉,否则我提供的简化代码片段不足以找到问题的根源(我至少应该自己运行它……)。这是因为,在实际代码中,我声明了两个单独的缩放器,一个用于 x 训练数据,一个用于 y 训练数据。
这里的问题是,当我调用该函数时,我传递了如下参数:
注意我如何在函数调用中实例化 MinMaxScaler 类。这会导致
xscaler
和yscaler
变量在函数内引用同一个对象,从而导致来自xscaler.fit_transform()
的属性当我调用 yscaler.fit_transform() 方法时会丢失。
传递 MinMaxScaler 类而不是在函数调用中实例化它可以解决问题。
SOLUTION
I solved the problem, and hopefully (even though I doubt many will encounter the same) the solution will be useful to others.
Unless a superhuman intuition, the simplified code snippet I provided wouldn't have been enough to find where the problem came from ( I should have at least run it myself smh...). And that is because, in the actual code, I declare two separate scalers, one for the x train data, and one for the y train data.
The issue here is that when I called the function, I passed the arguments like the following :
Notice how I instantiate the MinMaxScaler Class in the function call. This causes the
xscaler
andyscaler
variables two refer to the same object inside the function, causing the attributes fromxscaler.fit_transform()
to be lost when I call the
yscaler.fit_transform()
method.Passing the MinMaxScaler class instead of instantiating it in the function call solves the issue.