在应用sklearn.com.pose.columntransformer上，如何保留列订单

发布于 2025-02-06 03:11:54 字数 957 浏览 2 评论 0原文

我想使用Pipeline和columntransformer来自Sklearn库的模块在Numpy数组上应用缩放。 Scaleer应用于某些列。而且，我想具有相同的输入列顺序输出。

示例：

import numpy as np
from sklearn.compose import ColumnTransformer 
from sklearn.preprocessing import  MinMaxScaler


X = np.array ( [(25, 1, 2, 0),
                (30, 1, 5, 0),
                (25, 10, 2, 1),
                (25, 1, 2, 0),
                (np.nan, 10, 4, 1),
                (40, 1, 2, 1) ] )



column_trans = ColumnTransformer(
    [ ('scaler', MinMaxScaler(), [0,2]) ], 
     remainder='passthrough') 
      
X_scaled = column_trans.fit_transform(X)

问题是columnTransFormer更改列的顺序。如何保留列的原始顺序？

我知道这 post 。但是，它是用于熊猫数据框架的。由于某些原因，我无法使用数据框，并且必须在代码中使用Numpy数组。

谢谢。

原文

I want to use Pipeline and ColumnTransformer modules from sklearn library to apply scaling on numpy array. Scaler is applied on some of the columns. And, I want to have the output with same column order of input.

Example:

import numpy as np
from sklearn.compose import ColumnTransformer 
from sklearn.preprocessing import  MinMaxScaler


X = np.array ( [(25, 1, 2, 0),
                (30, 1, 5, 0),
                (25, 10, 2, 1),
                (25, 1, 2, 0),
                (np.nan, 10, 4, 1),
                (40, 1, 2, 1) ] )



column_trans = ColumnTransformer(
    [ ('scaler', MinMaxScaler(), [0,2]) ], 
     remainder='passthrough') 
      
X_scaled = column_trans.fit_transform(X)

The problem is that ColumnTransformer changes the order of columns. How can I preserve the original order of columns?

I am aware of this post. But, it is for pandas DataFrame. For some reasons, I cannot use DataFrame and I have to use numpy array in my code.

Thanks.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

纵性 2025-02-13 03:11:54

这是一个解决方案，通过添加一个变压器，该变压器将在列变换后应用逆列置换：

from sklearn.base import BaseEstimator, TransformerMixin
import re


class ReorderColumnTransformer(BaseEstimator, TransformerMixin):
    index_pattern = re.compile(r'\d+
它依赖于解析
column_trans.get_feature_names_out()
# = array(['scaler__x1', 'scaler__x3', 'remainder__x0', 'remainder__x2'],
#      dtype=object)

来读取后缀号码的初始列顺序。然后计算和应用逆置换。
用作：
import numpy as np
from sklearn.compose import ColumnTransformer 
from sklearn.preprocessing import  MinMaxScaler
from sklearn.pipeline import make_pipeline

X = np.array ( [(25, 1, 2, 0),
                (30, 1, 5, 0),
                (25, 10, 2, 1),
                (25, 1, 2, 0),
                (np.nan, 10, 4, 1),
                (40, 1, 2, 1) ] )



column_trans = ColumnTransformer(
    [ ('scaler', MinMaxScaler(), [0,2]) ], 
     remainder='passthrough') 

pipeline = make_pipeline( column_trans, ReorderColumnTransformer(column_transformer=column_trans))
X_scaled = pipeline.fit_transform(X)
#X_scaled has same column order as X


替代解决方案不依赖字符串解析，而是读取列变压器的列切片：
from sklearn.base import BaseEstimator, TransformerMixin


class ReorderColumnTransformer(BaseEstimator, TransformerMixin):
    
    def __init__(self, column_transformer):
        self.column_transformer = column_transformer
        
    def fit(self, X, y=None):
        return self

    def transform(self, X, y=None):
        slices = self.column_transformer.output_indices_.values()
        n_cols = self.column_transformer.n_features_in_
        order_after_column_transform = [value for slice_ in slices for value in range(n_cols)[slice_]]
        
        order_inverse = np.zeros(n_cols, dtype=int)
        order_inverse[order_after_column_transform] = np.arange(n_cols)
        return X[:, order_inverse]

)
    
    def __init__(self, column_transformer):
        self.column_transformer = column_transformer
        
    def fit(self, X, y=None):
        return self

    def transform(self, X, y=None):
        order_after_column_transform = [int( self.index_pattern.search(col).group()) for col in self.column_transformer.get_feature_names_out()]
        order_inverse = np.zeros(len(order_after_column_transform), dtype=int)
        order_inverse[order_after_column_transform] = np.arange(len(order_after_column_transform))
        return X[:, order_inverse]

它依赖于解析

来读取后缀号码的初始列顺序。然后计算和应用逆置换。

用作：

替代解决方案不依赖字符串解析，而是读取列变压器的列切片：

Here is a solution by adding a transformer which will apply the inverse column permutation after the column transform:

from sklearn.base import BaseEstimator, TransformerMixin
import re


class ReorderColumnTransformer(BaseEstimator, TransformerMixin):
    index_pattern = re.compile(r'\d+
It relies on parsing
column_trans.get_feature_names_out()
# = array(['scaler__x1', 'scaler__x3', 'remainder__x0', 'remainder__x2'],
#      dtype=object)

to read the initial column order from the suffix number. Then computing and applying the inverse permutation.
To be used as:
import numpy as np
from sklearn.compose import ColumnTransformer 
from sklearn.preprocessing import  MinMaxScaler
from sklearn.pipeline import make_pipeline

X = np.array ( [(25, 1, 2, 0),
                (30, 1, 5, 0),
                (25, 10, 2, 1),
                (25, 1, 2, 0),
                (np.nan, 10, 4, 1),
                (40, 1, 2, 1) ] )



column_trans = ColumnTransformer(
    [ ('scaler', MinMaxScaler(), [0,2]) ], 
     remainder='passthrough') 

pipeline = make_pipeline( column_trans, ReorderColumnTransformer(column_transformer=column_trans))
X_scaled = pipeline.fit_transform(X)
#X_scaled has same column order as X


Alternative solution not relying on string parsing but reading the column slices of the column transformer:
from sklearn.base import BaseEstimator, TransformerMixin


class ReorderColumnTransformer(BaseEstimator, TransformerMixin):
    
    def __init__(self, column_transformer):
        self.column_transformer = column_transformer
        
    def fit(self, X, y=None):
        return self

    def transform(self, X, y=None):
        slices = self.column_transformer.output_indices_.values()
        n_cols = self.column_transformer.n_features_in_
        order_after_column_transform = [value for slice_ in slices for value in range(n_cols)[slice_]]
        
        order_inverse = np.zeros(n_cols, dtype=int)
        order_inverse[order_after_column_transform] = np.arange(n_cols)
        return X[:, order_inverse]

)
    
    def __init__(self, column_transformer):
        self.column_transformer = column_transformer
        
    def fit(self, X, y=None):
        return self

    def transform(self, X, y=None):
        order_after_column_transform = [int( self.index_pattern.search(col).group()) for col in self.column_transformer.get_feature_names_out()]
        order_inverse = np.zeros(len(order_after_column_transform), dtype=int)
        order_inverse[order_after_column_transform] = np.arange(len(order_after_column_transform))
        return X[:, order_inverse]

It relies on parsing

to read the initial column order from the suffix number. Then computing and applying the inverse permutation.

To be used as:

Alternative solution not relying on string parsing but reading the column slices of the column transformer:

回复收藏 0 原文

眼前雾蒙蒙 2025-02-13 03:11:54

columnTransFormer可用于重新排序列，但是您希望通过以所需的顺序传递列索引。配对columnTransFormer与Identity functionTransFormer将其无能为力，只能重新排序列。（您可以通过在初始化func时通过不分配funcontransform来创建身份functionTransformer function> functionTransFormer ，在这种情况下，数据将通过而不会转换）。

import numpy as np
from sklearn.compose import make_column_transformer
from sklearn.preprocessing import FunctionTransformer

X = np.array ( [[30, 20, 10]] )
new_column_order = [2, 1, 0]
column_reorder_transformer = make_column_transformer((FunctionTransformer(), new_column_order))
Xt = column_reorder_transformer.fit_transform(X)
print(f"Xt = {Xt}")
# arr = [[10 20 30]]

ColumnTransformer can be used to reorder columns however you would like by passing it the column indices in the desired order. Pairing ColumnTransformer with an identity FunctionTransformer will make it do nothing but reorder the columns. (You can create an identity FunctionTransformer by not assigning func when initializing FunctionTransformer, in which case the data will passed through without being transformed).

import numpy as np
from sklearn.compose import make_column_transformer
from sklearn.preprocessing import FunctionTransformer

X = np.array ( [[30, 20, 10]] )
new_column_order = [2, 1, 0]
column_reorder_transformer = make_column_transformer((FunctionTransformer(), new_column_order))
Xt = column_reorder_transformer.fit_transform(X)
print(f"Xt = {Xt}")
# arr = [[10 20 30]]

回复收藏 0 原文

~没有更多了~