Sarimax从样品预测中使用外源数据
我正在与Sarimax进行时间分析,并一直在努力。
我认为我已经成功地拟合了一个模型,并用它来做出预测。但是,我不知道如何通过外源数据从样本预测中制定出来。
我可能会做错了整个事情,所以我在下面包含了一些示例数据;
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from pandas import datetime
import statsmodels.api as sm
# Defining Sample data
df = pd.DataFrame({'date':['2019-01-01','2019-01-02','2019-01-03',
'2019-01-04','2019-01-05','2019-01-06',
'2019-01-07','2019-01-08','2019-01-09',
'2019-01-10','2019-01-11','2019-01-12'],
'price':[78,60,62,64,66,68,70,72,74,76,78,80],
'factor1':[178,287,152,294,155,245,168,276,165,275,178,221]
})
# Changing index to datetime
df['date'] = pd.to_datetime(df['date'], errors='ignore', format='%Y%m%d')
select_dates = df.set_index(['date'])
df = df.set_index('date')
df.index = pd.to_datetime(df.index)
df.sort_index(inplace=True)
df.dropna(inplace=True)
# Splitting Data into test and training sets manually
train = df.loc['2019-01-01':'2019-01-09']
test = df.loc['2019-01-10':'2019-01-12']
# setting index to datetime for test and train datasets
train.index = pd.DatetimeIndex(train.index).to_period('D')
test.index = pd.DatetimeIndex(test.index).to_period('D')
# Defining and fitting the model with training data for endogenous and exogenous data
model=sm.tsa.statespace.SARIMAX(train['price'],
order=(0, 0, 0),
seasonal_order=(0, 0, 0,12),
exog=train.iloc[:,1:],
time_varying_regression=True,
mle_regression=False)
model_1= model.fit(disp=False)
# Defining exogenous data for testing
exog_test=test.iloc[:,1:]
# Forecasting out of sample data with exogenous data
forecast = model_1.forecast(3, exog=exog_test)
所以我的问题确实在最后一行,如果我想要超过3个步骤,该怎么办?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我会尝试回答这个问题,因为它主要与统计模型软件包的数据类型和文档有关。
根据文档,“步骤”是一个整数,从示例末尾预测的步骤 。这也意味着,如果您有兴趣获得三个以上的步骤,则需要提供其他数组数据进行培训和测试数据(请注意 - 两者)。
()
()
这是我在将步长增加一个时会遇到的两个错误
/dev/generated/statsmodels.tsa.statespace.sarimax.sarimax.sarimaxresults.forecast.html 尺寸3成形(4,1)
提供的外源值不是适当的形状。需要(4,1),得到(3,1)。
valueerror :外源变量中的行数与您要求它预测的时间段的数量不匹配,
因为这就是所述扩展的测试集可以很好地工作,并在这里为您提供其他预测。有效的代码和工作笔记本链接:
I would attempt to answer this question as it mainly relates to the type of data and documentation about statsmodels package.
As per the documentation the 'steps' are an integer, the number of steps to forecast from the end of the sample. That also means if you are interested in getting more than three steps you need to provide additional array data for training and TESTING data (note - both).
(https://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html)
(https://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAXResults.forecast.html)
Here are two errors I get when I increase step size by one:
ValueError: cannot reshape array of size 3 into shape (4,1)
Provided exogenous values are not of the appropriate shape. Required (4, 1), got (3, 1).
ValueError: the number of rows in the exogenous variable does not match the number of time periods you're asking it to predict
With that said simply expanding the testing set works well and gets you additional forecasts here is the code that works and the working notebook link:
https://colab.research.google.com/drive/1o9KXAe61EKH6bDI-FJO3qXzlWjz9IHHw?usp=sharing