获取 sklearn pipeline 中 Costum Transformer 的实例变量

发布于 2025-01-19 10:18:38 字数 737 浏览 0 评论 0原文

我的任务是解决数据集上的监督学习问题，并希望创建一个从完整的开始到结束的完整管道。从训练测试分割开始。我编写了一个自定义类来将 sklearns train_test_split 实现到 sklearn 管道中。它的 fit_transform 返回训练集。后来我仍然想访问测试集，所以我将它作为自定义转换器类中的实例变量，如下所示：

self.test_set = test_set

from sklearn.model_selection import train_test_split

class train_test_splitter([...])
[... 
...]
    def transform(self, X):
        train_set, test_set = train_test_split(X, test_size=0.2)
        self.test_set = test_set
        return train_set

split_pipeline = Pipeline([
    ('splitter', train_test_splitter() ),    
])
df_train = split_pipeline.fit_transform(df)

现在我想获取这样的测试集：

df_test = splitter.test_set

它不起作用。如何获取实例“splitter”的变量。它存储在哪里？

原文

I am tasked with a supervised learning problem on a dataset and want to create a full Pipeline from complete beginning to end.
Starting with the train-test splitting. I wrote a custom class to implement sklearns train_test_split into the sklearn pipeline. Its fit_transform returns the training set. Later i still want to accsess the test set, so i made it an instance variable in the custom transformer class like this:

self.test_set = test_set

from sklearn.model_selection import train_test_split

class train_test_splitter([...])
[... 
...]
    def transform(self, X):
        train_set, test_set = train_test_split(X, test_size=0.2)
        self.test_set = test_set
        return train_set

split_pipeline = Pipeline([
    ('splitter', train_test_splitter() ),    
])
df_train = split_pipeline.fit_transform(df)

Now i want to get the test set like this:

df_test = splitter.test_set

Its not working. How do I get the variables of the instance "splitter". Where does it get stored?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

定格我的天空 2025-01-26 10:18:38

您可以通过多种方式访问管道的步骤。例如，

split_pipeline['splitter'].test_set

也就是说，我认为这不是一个好方法。当您使用更多步骤填充管道时，在 fit 时间，一切都会按照您想要的方式工作，但是当预测/转换其他数据时，您仍然会调用您的 transform 方法，这将生成一个新训练-测试分割，忘记旧的，并将新的训练集发送到管道中以完成剩余步骤。

You can access the steps of a pipeline in a number of ways. For example,

split_pipeline['splitter'].test_set

That said, I don't think this is a good approach. When you fill out the pipeline with more steps, at fit time everything will work how you want, but when predicting/transforming on other data you will still be calling your transform method, which will generate a new train-test split, forgetting the old one, and sending the new train set down the pipe for the remaining steps.

回复收藏 0 原文

~没有更多了~