返回时 Sklearn 缩放器属性发生变化:X 有 6 个特征,但 MinMaxScaler 期望 1 个特征作为输入

发布于 2025-01-12 06:16:22 字数 1155 浏览 0 评论 0原文

我最近在 Sklearn 中遇到了一个令人困惑的问题,即使绕过它并不难,我也想了解到底发生了什么。

所以问题是,当我尝试使用函数返回的缩放器对象转换任何数据集时,我收到错误 ValueError: X has 6 features, but MinMaxScaler is waiting 1 feature as input.之前在具有 6 个特征的数据集上对其进行了拟合 (fit_transform)。

如果我尝试在该函数内部和任何数据集上使用缩放器,如果给定数据集有 6 个特征,它要么运行正常,要么引发错误 ValueError: X has Y features, but MinMaxScaler is Expecting 6 features as input。 ,Y 是输入的特征编号。

因此,似乎只要我返回对象,n_features_in 属性就会设置为 1。其他属性不会更改,如果我手动将 n_features_in 属性设置为6 一切看起来都很顺利。

所以问题是:到底发生了什么?

编辑:进行了一些测试,其他属性确实发生了变化。 data_max_ 属性通常是 len 6 的列表,变为 len 1 的列表,包含训练集第一个特征的最大值。

这是一个简化的代码片段,有助于理解代码结构:

def construct(dataset, scaler_type) :

    scaler = scaler_type

    scaled_data = scaler.fit_transform(dataset)

    print(scaler.n_features_in_) #Prints 6

    scaler.transform([[1,2,3,4,5,6]]) #Works fine

    return scaled_data, scaler

scaled_data, scaler = construct(dataset, MinMaxScaler())

print(scaler.n_features_in_) #Prints 1

scaler.transform([[1,2,3,4,5,6]]) #Raises error : ValueError: X has 6 features, but MinMaxScaler is expecting 1 features as input.

I recently encountered a puzzling problem with Sklearn, and even if it's not that hard to bypass it, I'd like to understand what's going on.

So the problem is that I get the error ValueError: X has 6 features, but MinMaxScaler is expecting 1 feature as input. when I try to transform any dataset with a scaler object returned by a function in which I previously fitted it (fit_transform) on a dataset with 6 features.

If I try to use the scaler inside that function and on any dataset, it either runs fine if the given dataset has 6 features, or raises the error ValueError: X has Y features, but MinMaxScaler is expecting 6 features as input., with Y being the features number from the input.

So it seems that as soon as I return the object, the n_features_in attribute is set to 1. The other attributes don't change and if I manually set the n_features_in attribute to 6 everything seems to work fine.

So the question is: what is going on?

Edit: Did a little bit of testing and the other attributes do change. The data_max_ attribute which normally is a list of len 6 becomes a list of len 1, containing the max value of the first feature of the train set.

Here is a simplified code snippet to help understand the code structure :

def construct(dataset, scaler_type) :

    scaler = scaler_type

    scaled_data = scaler.fit_transform(dataset)

    print(scaler.n_features_in_) #Prints 6

    scaler.transform([[1,2,3,4,5,6]]) #Works fine

    return scaled_data, scaler

scaled_data, scaler = construct(dataset, MinMaxScaler())

print(scaler.n_features_in_) #Prints 1

scaler.transform([[1,2,3,4,5,6]]) #Raises error : ValueError: X has 6 features, but MinMaxScaler is expecting 1 features as input.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

冷清清 2025-01-19 06:16:22

解决方案

我解决了这个问题,希望(尽管我怀疑很多人会遇到同样的问题)该解决方案对其他人有用。

除非有超人的直觉,否则我提供的简化代码片段不足以找到问题的根源(我至少应该自己运行它……)。这是因为,在实际代码中,我声明了两个单独的缩放器,一个用于 x 训练数据,一个用于 y 训练数据。

def construct(dataset, scaler_type):
    xscaler, yscaler = scaler_type, scaler_type
    
    #do things...

    x_set = xscaler.fit_transform(dataset)

    #taking first feature as y data
    y_set = yscaler_fit_transform(dataset[:,0])
    
    scaled_data = (x_set, y_set)

    return scaled_data, xscaler, yscaler

这里的问题是,当我调用该函数时,我传递了如下参数:

scaled_data, xscaler, yscaler = construct(dataset, MinMaxScaler())

注意我如何在函数调用中实例化 MinMaxScaler 类。这会导致 xscaleryscaler 变量在函数内引用同一个对象,从而导致来自 xscaler.fit_transform() 的属性
当我调用 yscaler.fit_transform() 方法时会丢失。

传递 MinMaxScaler 类而不是在函数调用中实例化它可以解决问题。

SOLUTION

I solved the problem, and hopefully (even though I doubt many will encounter the same) the solution will be useful to others.

Unless a superhuman intuition, the simplified code snippet I provided wouldn't have been enough to find where the problem came from ( I should have at least run it myself smh...). And that is because, in the actual code, I declare two separate scalers, one for the x train data, and one for the y train data.

def construct(dataset, scaler_type):
    xscaler, yscaler = scaler_type, scaler_type
    
    #do things...

    x_set = xscaler.fit_transform(dataset)

    #taking first feature as y data
    y_set = yscaler_fit_transform(dataset[:,0])
    
    scaled_data = (x_set, y_set)

    return scaled_data, xscaler, yscaler

The issue here is that when I called the function, I passed the arguments like the following :

scaled_data, xscaler, yscaler = construct(dataset, MinMaxScaler())

Notice how I instantiate the MinMaxScaler Class in the function call. This causes the xscaler and yscaler variables two refer to the same object inside the function, causing the attributes from xscaler.fit_transform()
to be lost when I call the yscaler.fit_transform() method.

Passing the MinMaxScaler class instead of instantiating it in the function call solves the issue.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文