Sarimax从样品预测中使用外源数据

发布于 2025-01-25 10:04:14 字数 1972 浏览 3 评论 0 原文

我正在与Sarimax进行时间分析,并一直在努力。

我认为我已经成功地拟合了一个模型,并用它来做出预测。但是,我不知道如何通过外源数据从样本预测中制定出来。

我可能会做错了整个事情,所以我在下面包含了一些示例数据;

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from pandas import datetime
import statsmodels.api as sm

# Defining Sample data
df = pd.DataFrame({'date':['2019-01-01','2019-01-02','2019-01-03',
                         '2019-01-04','2019-01-05','2019-01-06',
                         '2019-01-07','2019-01-08','2019-01-09',
                         '2019-01-10','2019-01-11','2019-01-12'],
                  'price':[78,60,62,64,66,68,70,72,74,76,78,80],
                 'factor1':[178,287,152,294,155,245,168,276,165,275,178,221]
                })
# Changing index to datetime
df['date'] = pd.to_datetime(df['date'], errors='ignore', format='%Y%m%d')
select_dates = df.set_index(['date'])

df = df.set_index('date')
df.index = pd.to_datetime(df.index)
df.sort_index(inplace=True)
df.dropna(inplace=True)

# Splitting Data into test and training sets manually
train = df.loc['2019-01-01':'2019-01-09']
test = df.loc['2019-01-10':'2019-01-12']

# setting index to datetime for test and train datasets
train.index = pd.DatetimeIndex(train.index).to_period('D')
test.index = pd.DatetimeIndex(test.index).to_period('D')

# Defining and fitting the model with training data for endogenous and exogenous data

model=sm.tsa.statespace.SARIMAX(train['price'],
                                order=(0, 0, 0),
                                seasonal_order=(0, 0, 0,12), 
                                exog=train.iloc[:,1:],
                                time_varying_regression=True,
                                mle_regression=False)
model_1= model.fit(disp=False)

# Defining exogenous data for testing 
exog_test=test.iloc[:,1:]

# Forecasting out of sample data with exogenous data
forecast = model_1.forecast(3, exog=exog_test)

所以我的问题确实在最后一行,如果我想要超过3个步骤,该怎么办?

I am working on a timeseries analysis with SARIMAX and have been really struggling with it.

I think I have successfully fit a model and used it to make predictions; however, I don't know how to make out of sample forecast with exogenous data.

I may be doing the whole thing wrong so I have included my steps below with some sample data;

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from pandas import datetime
import statsmodels.api as sm

# Defining Sample data
df = pd.DataFrame({'date':['2019-01-01','2019-01-02','2019-01-03',
                         '2019-01-04','2019-01-05','2019-01-06',
                         '2019-01-07','2019-01-08','2019-01-09',
                         '2019-01-10','2019-01-11','2019-01-12'],
                  'price':[78,60,62,64,66,68,70,72,74,76,78,80],
                 'factor1':[178,287,152,294,155,245,168,276,165,275,178,221]
                })
# Changing index to datetime
df['date'] = pd.to_datetime(df['date'], errors='ignore', format='%Y%m%d')
select_dates = df.set_index(['date'])

df = df.set_index('date')
df.index = pd.to_datetime(df.index)
df.sort_index(inplace=True)
df.dropna(inplace=True)

# Splitting Data into test and training sets manually
train = df.loc['2019-01-01':'2019-01-09']
test = df.loc['2019-01-10':'2019-01-12']

# setting index to datetime for test and train datasets
train.index = pd.DatetimeIndex(train.index).to_period('D')
test.index = pd.DatetimeIndex(test.index).to_period('D')

# Defining and fitting the model with training data for endogenous and exogenous data

model=sm.tsa.statespace.SARIMAX(train['price'],
                                order=(0, 0, 0),
                                seasonal_order=(0, 0, 0,12), 
                                exog=train.iloc[:,1:],
                                time_varying_regression=True,
                                mle_regression=False)
model_1= model.fit(disp=False)

# Defining exogenous data for testing 
exog_test=test.iloc[:,1:]

# Forecasting out of sample data with exogenous data
forecast = model_1.forecast(3, exog=exog_test)

so my problem is really with the last line, what do I do if I want more than 3 steps?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

z祗昰~ 2025-02-01 10:04:14

我会尝试回答这个问题,因为它主要与统计模型软件包的数据类型和文档有关。

根据文档,“步骤”是一个整数,从示例末尾预测的步骤 。这也意味着,如果您有兴趣获得三个以上的步骤,则需要提供其他数组数据进行培训和测试数据(请注意 - 两者)。

这是我在将步长增加一个时会遇到的两个错误

/dev/generated/statsmodels.tsa.statespace.sarimax.sarimax.sarimaxresults.forecast.html 尺寸3成形(4,1)
提供的外源值不是适当的形状。需要(4,1),得到(3,1)。

valueerror :外源变量中的行数与您要求它预测的时间段的数量不匹配,

因为这就是所述扩展的测试集可以很好地工作,并在这里为您提供其他预测。有效的代码和工作笔记本链接:

import pandas as pd
import numpy as np
# from sklearn.model_selection import train_test_split 
# why import this if you want to do tran/test manually? 
from pandas import datetime

# Defining Sample data
df=pd.DataFrame({'date':['2019-01-01','2019-01-02','2019-01-03',
                         '2019-01-04','2019-01-05','2019-01-06',
                         '2019-01-07','2019-01-08','2019-01-09',
                         '2019-01-10','2019-01-11','2019-01-12'],
                  'price':[78,60,62,64,66,68,70,72,74,76,78,80],
                 'factor1':[178,287,152,294,155,245,168,276,165,275,178,221]
                })
# Changing index to datetime
df['date'] = pd.to_datetime(df['date'], errors='ignore', format='%Y%m%d')
select_dates = df.set_index(['date'])

df = df.set_index('date')
df.index = pd.to_datetime(df.index)
df.sort_index(inplace=True)
df.dropna(inplace=True)

# Splitting Data into test and training sets manually
train=df.loc['2019-01-01':'2019-01-09']
# I made a change here #CHANGED 10 to 09 so one more month got added
# that means my input array is now 4,1 (if you add a column array is - ) 
# (4,2) 
# I can give any step from -4,0,4 (integral)

test=df.loc['2019-01-09':'2019-01-12']

# setting index to datetime for test and train datasets
train.index = pd.DatetimeIndex(train.index).to_period('D')
test.index = pd.DatetimeIndex(test.index).to_period('D')

# Defining and fitting the model with training data for endogenous and exogenous data
import statsmodels.api as sm

model=sm.tsa.statespace.SARIMAX(train['price'],
                                order=(0, 0, 0),
                                seasonal_order=(0, 0, 0,12), 
                                exog=train.iloc[:,1:],
                                time_varying_regression=True,
                                mle_regression=False)
model_1= model.fit(disp=False)

# Defining exogenous data for testing 
exog_test=test.iloc[:,1:]

# Forcasting out of sample data with exogenous data
forecast = model_1.forecast(4, exog=exog_test)
 

I would attempt to answer this question as it mainly relates to the type of data and documentation about statsmodels package.

As per the documentation the 'steps' are an integer, the number of steps to forecast from the end of the sample. That also means if you are interested in getting more than three steps you need to provide additional array data for training and TESTING data (note - both).
(https://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html)
(https://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAXResults.forecast.html)

Here are two errors I get when I increase step size by one:

ValueError: cannot reshape array of size 3 into shape (4,1)
Provided exogenous values are not of the appropriate shape. Required (4, 1), got (3, 1).

ValueError: the number of rows in the exogenous variable does not match the number of time periods you're asking it to predict

With that said simply expanding the testing set works well and gets you additional forecasts here is the code that works and the working notebook link:

https://colab.research.google.com/drive/1o9KXAe61EKH6bDI-FJO3qXzlWjz9IHHw?usp=sharing

import pandas as pd
import numpy as np
# from sklearn.model_selection import train_test_split 
# why import this if you want to do tran/test manually? 
from pandas import datetime

# Defining Sample data
df=pd.DataFrame({'date':['2019-01-01','2019-01-02','2019-01-03',
                         '2019-01-04','2019-01-05','2019-01-06',
                         '2019-01-07','2019-01-08','2019-01-09',
                         '2019-01-10','2019-01-11','2019-01-12'],
                  'price':[78,60,62,64,66,68,70,72,74,76,78,80],
                 'factor1':[178,287,152,294,155,245,168,276,165,275,178,221]
                })
# Changing index to datetime
df['date'] = pd.to_datetime(df['date'], errors='ignore', format='%Y%m%d')
select_dates = df.set_index(['date'])

df = df.set_index('date')
df.index = pd.to_datetime(df.index)
df.sort_index(inplace=True)
df.dropna(inplace=True)

# Splitting Data into test and training sets manually
train=df.loc['2019-01-01':'2019-01-09']
# I made a change here #CHANGED 10 to 09 so one more month got added
# that means my input array is now 4,1 (if you add a column array is - ) 
# (4,2) 
# I can give any step from -4,0,4 (integral)

test=df.loc['2019-01-09':'2019-01-12']

# setting index to datetime for test and train datasets
train.index = pd.DatetimeIndex(train.index).to_period('D')
test.index = pd.DatetimeIndex(test.index).to_period('D')

# Defining and fitting the model with training data for endogenous and exogenous data
import statsmodels.api as sm

model=sm.tsa.statespace.SARIMAX(train['price'],
                                order=(0, 0, 0),
                                seasonal_order=(0, 0, 0,12), 
                                exog=train.iloc[:,1:],
                                time_varying_regression=True,
                                mle_regression=False)
model_1= model.fit(disp=False)

# Defining exogenous data for testing 
exog_test=test.iloc[:,1:]

# Forcasting out of sample data with exogenous data
forecast = model_1.forecast(4, exog=exog_test)
 
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文