返回时 Sklearn 缩放器属性发生变化：X 有 6 个特征，但 MinMaxScaler 期望 1 个特征作为输入

发布于 2025-01-12 06:16:22 字数 1155 浏览 0 评论 0原文

我最近在 Sklearn 中遇到了一个令人困惑的问题，即使绕过它并不难，我也想了解到底发生了什么。

所以问题是，当我尝试使用函数返回的缩放器对象转换任何数据集时，我收到错误 ValueError: X has 6 features, but MinMaxScaler is waiting 1 feature as input.之前在具有 6 个特征的数据集上对其进行了拟合 (fit_transform)。

如果我尝试在该函数内部和任何数据集上使用缩放器，如果给定数据集有 6 个特征，它要么运行正常，要么引发错误 ValueError: X has Y features, but MinMaxScaler is Expecting 6 features as input。，Y 是输入的特征编号。

因此，似乎只要我返回对象，n_features_in 属性就会设置为 1。其他属性不会更改，如果我手动将 n_features_in 属性设置为6 一切看起来都很顺利。

所以问题是：到底发生了什么？

编辑：进行了一些测试，其他属性确实发生了变化。 data_max_ 属性通常是 len 6 的列表，变为 len 1 的列表，包含训练集第一个特征的最大值。

这是一个简化的代码片段，有助于理解代码结构：

def construct(dataset, scaler_type) :

    scaler = scaler_type

    scaled_data = scaler.fit_transform(dataset)

    print(scaler.n_features_in_) #Prints 6

    scaler.transform([[1,2,3,4,5,6]]) #Works fine

    return scaled_data, scaler

scaled_data, scaler = construct(dataset, MinMaxScaler())

print(scaler.n_features_in_) #Prints 1

scaler.transform([[1,2,3,4,5,6]]) #Raises error : ValueError: X has 6 features, but MinMaxScaler is expecting 1 features as input.

原文

I recently encountered a puzzling problem with Sklearn, and even if it's not that hard to bypass it, I'd like to understand what's going on.

So the problem is that I get the error ValueError: X has 6 features, but MinMaxScaler is expecting 1 feature as input. when I try to transform any dataset with a scaler object returned by a function in which I previously fitted it (fit_transform) on a dataset with 6 features.

If I try to use the scaler inside that function and on any dataset, it either runs fine if the given dataset has 6 features, or raises the error ValueError: X has Y features, but MinMaxScaler is expecting 6 features as input., with Y being the features number from the input.

So it seems that as soon as I return the object, the n_features_in attribute is set to 1. The other attributes don't change and if I manually set the n_features_in attribute to 6 everything seems to work fine.

So the question is: what is going on?

Edit: Did a little bit of testing and the other attributes do change. The data_max_ attribute which normally is a list of len 6 becomes a list of len 1, containing the max value of the first feature of the train set.

Here is a simplified code snippet to help understand the code structure :

def construct(dataset, scaler_type) :

    scaler = scaler_type

    scaled_data = scaler.fit_transform(dataset)

    print(scaler.n_features_in_) #Prints 6

    scaler.transform([[1,2,3,4,5,6]]) #Works fine

    return scaled_data, scaler

scaled_data, scaler = construct(dataset, MinMaxScaler())

print(scaler.n_features_in_) #Prints 1

scaler.transform([[1,2,3,4,5,6]]) #Raises error : ValueError: X has 6 features, but MinMaxScaler is expecting 1 features as input.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

冷清清 2025-01-19 06:16:22

解决方案

我解决了这个问题，希望（尽管我怀疑很多人会遇到同样的问题）该解决方案对其他人有用。

除非有超人的直觉，否则我提供的简化代码片段不足以找到问题的根源（我至少应该自己运行它……）。这是因为，在实际代码中，我声明了两个单独的缩放器，一个用于 x 训练数据，一个用于 y 训练数据。

def construct(dataset, scaler_type):
    xscaler, yscaler = scaler_type, scaler_type
    
    #do things...

    x_set = xscaler.fit_transform(dataset)

    #taking first feature as y data
    y_set = yscaler_fit_transform(dataset[:,0])
    
    scaled_data = (x_set, y_set)

    return scaled_data, xscaler, yscaler

这里的问题是，当我调用该函数时，我传递了如下参数：

scaled_data, xscaler, yscaler = construct(dataset, MinMaxScaler())

注意我如何在函数调用中实例化 MinMaxScaler 类。这会导致 xscaler 和 yscaler 变量在函数内引用同一个对象，从而导致来自 xscaler.fit_transform() 的属性
当我调用 yscaler.fit_transform() 方法时会丢失。

传递 MinMaxScaler 类而不是在函数调用中实例化它可以解决问题。

SOLUTION

I solved the problem, and hopefully (even though I doubt many will encounter the same) the solution will be useful to others.

Unless a superhuman intuition, the simplified code snippet I provided wouldn't have been enough to find where the problem came from ( I should have at least run it myself smh...). And that is because, in the actual code, I declare two separate scalers, one for the x train data, and one for the y train data.

def construct(dataset, scaler_type):
    xscaler, yscaler = scaler_type, scaler_type
    
    #do things...

    x_set = xscaler.fit_transform(dataset)

    #taking first feature as y data
    y_set = yscaler_fit_transform(dataset[:,0])
    
    scaled_data = (x_set, y_set)

    return scaled_data, xscaler, yscaler

The issue here is that when I called the function, I passed the arguments like the following :

scaled_data, xscaler, yscaler = construct(dataset, MinMaxScaler())

Notice how I instantiate the MinMaxScaler Class in the function call. This causes the xscaler and yscaler variables two refer to the same object inside the function, causing the attributes from xscaler.fit_transform()
to be lost when I call the yscaler.fit_transform() method.

Passing the MinMaxScaler class instead of instantiating it in the function call solves the issue.

回复收藏 0 原文

~没有更多了~