获取 sklearn pipeline 中 Costum Transformer 的实例变量

发布于 2025-01-19 10:18:38 字数 737 浏览 0 评论 0原文

我的任务是解决数据集上的监督学习问题,并希望创建一个从完整的开始到结束的完整管道。 从训练测试分割开始。我编写了一个自定义类来将 sklearns train_test_split 实现到 sklearn 管道中。它的 fit_transform 返回训练集。后来我仍然想访问测试集,所以我将它作为自定义转换器类中的实例变量,如下所示:

self.test_set = test_set
from sklearn.model_selection import train_test_split

class train_test_splitter([...])
[... 
...]
    def transform(self, X):
        train_set, test_set = train_test_split(X, test_size=0.2)
        self.test_set = test_set
        return train_set

split_pipeline = Pipeline([
    ('splitter', train_test_splitter() ),    
])
df_train = split_pipeline.fit_transform(df)

现在我想获取这样的测试集:

df_test = splitter.test_set

它不起作用。如何获取实例“splitter”的变量。它存储在哪里?

I am tasked with a supervised learning problem on a dataset and want to create a full Pipeline from complete beginning to end.
Starting with the train-test splitting. I wrote a custom class to implement sklearns train_test_split into the sklearn pipeline. Its fit_transform returns the training set. Later i still want to accsess the test set, so i made it an instance variable in the custom transformer class like this:

self.test_set = test_set
from sklearn.model_selection import train_test_split

class train_test_splitter([...])
[... 
...]
    def transform(self, X):
        train_set, test_set = train_test_split(X, test_size=0.2)
        self.test_set = test_set
        return train_set

split_pipeline = Pipeline([
    ('splitter', train_test_splitter() ),    
])
df_train = split_pipeline.fit_transform(df)

Now i want to get the test set like this:

df_test = splitter.test_set

Its not working. How do I get the variables of the instance "splitter". Where does it get stored?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

定格我的天空 2025-01-26 10:18:38

您可以通过多种方式访问​​管道的步骤。例如,

split_pipeline['splitter'].test_set

也就是说,我认为这不是一个好方法。当您使用更多步骤填充管道时,在 fit 时间,一切都会按照您想要的方式工作,但是当预测/转换其他数据时,您仍然会调用您的 transform 方法,这将生成一个训练-测试分割,忘记旧的,并将新的训练集发送到管道中以完成剩余步骤。

You can access the steps of a pipeline in a number of ways. For example,

split_pipeline['splitter'].test_set

That said, I don't think this is a good approach. When you fill out the pipeline with more steps, at fit time everything will work how you want, but when predicting/transforming on other data you will still be calling your transform method, which will generate a new train-test split, forgetting the old one, and sending the new train set down the pipe for the remaining steps.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文