获取 sklearn pipeline 中 Costum Transformer 的实例变量
我的任务是解决数据集上的监督学习问题,并希望创建一个从完整的开始到结束的完整管道。 从训练测试分割开始。我编写了一个自定义类来将 sklearns train_test_split 实现到 sklearn 管道中。它的 fit_transform 返回训练集。后来我仍然想访问测试集,所以我将它作为自定义转换器类中的实例变量,如下所示:
self.test_set = test_set
from sklearn.model_selection import train_test_split
class train_test_splitter([...])
[...
...]
def transform(self, X):
train_set, test_set = train_test_split(X, test_size=0.2)
self.test_set = test_set
return train_set
split_pipeline = Pipeline([
('splitter', train_test_splitter() ),
])
df_train = split_pipeline.fit_transform(df)
现在我想获取这样的测试集:
df_test = splitter.test_set
它不起作用。如何获取实例“splitter”的变量。它存储在哪里?
I am tasked with a supervised learning problem on a dataset and want to create a full Pipeline from complete beginning to end.
Starting with the train-test splitting. I wrote a custom class to implement sklearns train_test_split into the sklearn pipeline. Its fit_transform returns the training set. Later i still want to accsess the test set, so i made it an instance variable in the custom transformer class like this:
self.test_set = test_set
from sklearn.model_selection import train_test_split
class train_test_splitter([...])
[...
...]
def transform(self, X):
train_set, test_set = train_test_split(X, test_size=0.2)
self.test_set = test_set
return train_set
split_pipeline = Pipeline([
('splitter', train_test_splitter() ),
])
df_train = split_pipeline.fit_transform(df)
Now i want to get the test set like this:
df_test = splitter.test_set
Its not working. How do I get the variables of the instance "splitter". Where does it get stored?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以通过多种方式访问管道的步骤。例如,
也就是说,我认为这不是一个好方法。当您使用更多步骤填充管道时,在
fit
时间,一切都会按照您想要的方式工作,但是当预测/转换其他数据时,您仍然会调用您的transform
方法,这将生成一个新训练-测试分割,忘记旧的,并将新的训练集发送到管道中以完成剩余步骤。You can access the steps of a pipeline in a number of ways. For example,
That said, I don't think this is a good approach. When you fill out the pipeline with more steps, at
fit
time everything will work how you want, but when predicting/transforming on other data you will still be calling yourtransform
method, which will generate a new train-test split, forgetting the old one, and sending the new train set down the pipe for the remaining steps.