PyTorch Temporal Fusion Transformer 预测输出长度
我已经在一些训练数据上训练了时间融合变压器,并希望对一些看不见的数据进行预测。为此,我使用 pytorch_forecasting
TimeSeriesDataSet
数据结构
testing = TimeSeriesDataSet.from_dataset(training, df[lambda x: x.year >validation_cutoff] ,predict=True,stop_randomization=True)
鉴于
df[lambda x: x.year > validation_cutoff].shape
(97036, 13)
我希望
testing.data['reals'].shape
torch.Size([97036, 9])
收到包含 97036 行的预测输出向量。因此,我继续生成我的预测,如下所示
test_dataloader = testing.to_dataloader(train=False, batch_size=128 * 10, num_workers=0)
raw_predictions, x = best_tft.predict(testing, mode="raw", return_x=True)
但是,我收到的输出大小
raw_predictions['prediction'].shape
torch.Size([25476, 1, 7])
为 为什么其中一些 97036 个观测值被删除?
否则,我如何找出这 97036 个观测值中哪些被删除以及为什么被删除?
I have trained a temporal fusion transformer on some training data and would like to predict on some unseen data. To do so, I'm using the pytorch_forecasting
TimeSeriesDataSet
data structures
testing = TimeSeriesDataSet.from_dataset(training, df[lambda x: x.year > validation_cutoff], predict=True, stop_randomization=True)
with
df[lambda x: x.year > validation_cutoff].shape
(97036, 13)
Given that
testing.data['reals'].shape
torch.Size([97036, 9])
I would expect to receive a prediction output vector containing 97036 rows. So I proceed to generate my predictions like so
test_dataloader = testing.to_dataloader(train=False, batch_size=128 * 10, num_workers=0)
raw_predictions, x = best_tft.predict(testing, mode="raw", return_x=True)
However, I receive an output of the size
raw_predictions['prediction'].shape
torch.Size([25476, 1, 7])
Why are some of these 97036 observations being removed?
Or else, how can I find out which if these 97036 observations are being dropped and why the are being removed?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
摆脱
mode =“ raw”
,以便在max_prediction
horizon范围内获取预测。它将为每个组和max_prediction
Horizon的每一行和列提供一个预测。根据测试集的日期范围,这一次,每次在测试集上都会给出一个预测。
Get rid of
mode="raw"
in order to get a forecast on themax_prediction
horizon range. It is going to give one forecast for each individual row of group and columns ofmax_prediction
horizon.This gives one prediction, per one granular group, at a time on the test set, depending on the date range of the test set.
在
TimeSeriesDataSet
的源代码中,有一些过滤器可以删除短时间序列。当您在TimeSeriesDataSet.from_dataset
中设置predict=True
时,它会将min_prediction_length
设置为max_prediction_length
。然后,当要创建实际的测试数据加载器时,所有短于min_prediction_length
的时间序列都会被删除,这会从测试集中删除整个数据,从而留下恰好 0 个观测值。到底为什么要这样实现,我不知道。要进行预测,只需设置:In the source code of the
TimeSeriesDataSet
there are filters to remove short time series. When you setpredict=True
inTimeSeriesDataSet.from_dataset
, it sets themin_prediction_length
tomax_prediction_length
. Then, when the actual test dataloader is to be created, all of the time series that are shorter thanmin_prediction_length
are removed, which removes the entire data from the testing set, which leaves you with exactly 0 observations. Exactly why it is implemented in this way, I don't know. To make predictions just set: