拟合/变换单独的Sklearn变压器到单列的分区
用例:我有多个资产(例如AAPL,MSFT)和多个功能(例如MACD,波动率等)的时间序列数据。我正在建立一个ML模型,以在此数据的一个子集上进行分类预测。
问题:对于每个资产&功能 - 我想适合并应用转换。例如:对于波动率,我想安装用于AAPL,MSFT ...等的变压器,然后将该转换应用于数据分区。
当前状态:我当前使用compose.make_column_transformer
,但这仅将单个变压器应用于整列domatitions
,并且不允许对数据进行分区&单个变压器适合/应用于这些分区。
研究:我已经进行了一些研究,并遇到了sklearn.preprocessing.functiontransformer
,这似乎是我可以使用的构建块。但还没有弄清楚如何。
主要问题:构建可以在单列中拟合变压器(即组)的Sklearn管道的最佳方法是什么?任何代码指针都很棒。 ty
示例数据集:
date | tricker | 波动transfity | transforted_vol |
---|---|---|---|
01/01/18 | aapl | x | a(x) |
01/02/18 | aapl | x | a(x)A(x) |
... | aapl | x | a(x) |
12/30/ 22 | AAPL | X | A(X) |
12/31/22 | AAPL | X | A(X) |
01/01/18 | GOOG | X | B(X)B(X) |
01/02/18 | GOOG | X | B(X)B(X) |
... | GOOG | X | B(X) |
12 /30/22 | GOOG | X | B(X) |
12/31/22 | GOOG | X | B(x) |
Use case: I have time series data for multiple assets (eg. AAPL, MSFT) and multiple features (eg. MACD, Volatility etc). I am building a ML model to make classification predictions on a subset of this data.
Problem: For each asset & feature - I want to fit and apply a transformation. For example: for volatility, I want to fit a transformer for AAPL, MSFT... etc - and then apply that transformation to that partition of the data.
Current status: I currently use compose.make_column_transformer
but this only applies a single transformer to the entire column volatility
and does not allow partitioning of the data & individual transformers to be fit/applied to these partitions.
Research: I've done some research and come across sklearn.preprocessing.FunctionTransformer
which seems to be a building block I could use. But haven't figured out how.
Main question: What is the best way to build a sklearn pipeline that can fit a transformer to a partition (ie. groupby) within a single column? Any code pointers would be great. TY
Example dataset:
Date | Ticker | Volatility | transformed_vol |
---|---|---|---|
01/01/18 | AAPL | X | A(X) |
01/02/18 | AAPL | X | A(X) |
... | AAPL | X | A(X) |
12/30/22 | AAPL | X | A(X) |
12/31/22 | AAPL | X | A(X) |
01/01/18 | GOOG | X | B(X) |
01/02/18 | GOOG | X | B(X) |
... | GOOG | X | B(X) |
12/30/22 | GOOG | X | B(X) |
12/31/22 | GOOG | X | B(X) |
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为使用Scikit的内置功能以“优雅”方式可行,仅仅是因为变压器已应用于整列。但是,人们可以使用
functionalTransFormer
(正如您正确指出的)来规避此限制:我使用以下示例:
我添加了另一列只是为了演示。
哪个结果:
在这里,您可以在
f_dict
中添加其他列,然后将在列表理解中创建变压器。I don't think this is doable in an "elegant" way using Scikit's built-in functionality, simply because the transformers are applied on the whole column. However, one could use the
FunctionalTransformer
(as you correctly point out) to circumvent this limitation:I am using the following example:
I added another column just to demonstrate.
Which results in:
Here you can add other columns in
f_dict
and then the transformer will be created in the list comprehension.