当前位置：文江博客话题详情

Sarimax从样品预测中使用外源数据

发布于 2025-01-25 10:04:14 字数 1972 浏览 3 评论 0 原文

我正在与Sarimax进行时间分析，并一直在努力。

我认为我已经成功地拟合了一个模型，并用它来做出预测。但是，我不知道如何通过外源数据从样本预测中制定出来。

我可能会做错了整个事情，所以我在下面包含了一些示例数据；

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from pandas import datetime
import statsmodels.api as sm

# Defining Sample data
df = pd.DataFrame({'date':['2019-01-01','2019-01-02','2019-01-03',
                         '2019-01-04','2019-01-05','2019-01-06',
                         '2019-01-07','2019-01-08','2019-01-09',
                         '2019-01-10','2019-01-11','2019-01-12'],
                  'price':[78,60,62,64,66,68,70,72,74,76,78,80],
                 'factor1':[178,287,152,294,155,245,168,276,165,275,178,221]
                })
# Changing index to datetime
df['date'] = pd.to_datetime(df['date'], errors='ignore', format='%Y%m%d')
select_dates = df.set_index(['date'])

df = df.set_index('date')
df.index = pd.to_datetime(df.index)
df.sort_index(inplace=True)
df.dropna(inplace=True)

# Splitting Data into test and training sets manually
train = df.loc['2019-01-01':'2019-01-09']
test = df.loc['2019-01-10':'2019-01-12']

# setting index to datetime for test and train datasets
train.index = pd.DatetimeIndex(train.index).to_period('D')
test.index = pd.DatetimeIndex(test.index).to_period('D')

# Defining and fitting the model with training data for endogenous and exogenous data

model=sm.tsa.statespace.SARIMAX(train['price'],
                                order=(0, 0, 0),
                                seasonal_order=(0, 0, 0,12), 
                                exog=train.iloc[:,1:],
                                time_varying_regression=True,
                                mle_regression=False)
model_1= model.fit(disp=False)

# Defining exogenous data for testing 
exog_test=test.iloc[:,1:]

# Forecasting out of sample data with exogenous data
forecast = model_1.forecast(3, exog=exog_test)

所以我的问题确实在最后一行，如果我想要超过3个步骤，该怎么办？

原文

I am working on a timeseries analysis with SARIMAX and have been really struggling with it.

I think I have successfully fit a model and used it to make predictions; however, I don't know how to make out of sample forecast with exogenous data.

I may be doing the whole thing wrong so I have included my steps below with some sample data;

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from pandas import datetime
import statsmodels.api as sm

# Defining Sample data
df = pd.DataFrame({'date':['2019-01-01','2019-01-02','2019-01-03',
                         '2019-01-04','2019-01-05','2019-01-06',
                         '2019-01-07','2019-01-08','2019-01-09',
                         '2019-01-10','2019-01-11','2019-01-12'],
                  'price':[78,60,62,64,66,68,70,72,74,76,78,80],
                 'factor1':[178,287,152,294,155,245,168,276,165,275,178,221]
                })
# Changing index to datetime
df['date'] = pd.to_datetime(df['date'], errors='ignore', format='%Y%m%d')
select_dates = df.set_index(['date'])

df = df.set_index('date')
df.index = pd.to_datetime(df.index)
df.sort_index(inplace=True)
df.dropna(inplace=True)

# Splitting Data into test and training sets manually
train = df.loc['2019-01-01':'2019-01-09']
test = df.loc['2019-01-10':'2019-01-12']

# setting index to datetime for test and train datasets
train.index = pd.DatetimeIndex(train.index).to_period('D')
test.index = pd.DatetimeIndex(test.index).to_period('D')

# Defining and fitting the model with training data for endogenous and exogenous data

model=sm.tsa.statespace.SARIMAX(train['price'],
                                order=(0, 0, 0),
                                seasonal_order=(0, 0, 0,12), 
                                exog=train.iloc[:,1:],
                                time_varying_regression=True,
                                mle_regression=False)
model_1= model.fit(disp=False)

# Defining exogenous data for testing 
exog_test=test.iloc[:,1:]

# Forecasting out of sample data with exogenous data
forecast = model_1.forecast(3, exog=exog_test)

so my problem is really with the last line, what do I do if I want more than 3 steps?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

z祗昰~ 2025-02-01 10:04:14

我会尝试回答这个问题，因为它主要与统计模型软件包的数据类型和文档有关。

根据文档，“步骤”是一个整数，从示例末尾预测的步骤 。这也意味着，如果您有兴趣获得三个以上的步骤，则需要提供其他数组数据进行培训和测试数据（请注意 - 两者）。
（）
（）

这是我在将步长增加一个时会遇到的两个错误

/dev/generated/statsmodels.tsa.statespace.sarimax.sarimax.sarimaxresults.forecast.html 尺寸3成形（4,1）
提供的外源值不是适当的形状。需要（4，1），得到（3，1）。

valueerror ：外源变量中的行数与您要求它预测的时间段的数量不匹配，

因为这就是所述扩展的测试集可以很好地工作，并在这里为您提供其他预测。有效的代码和工作笔记本链接：

import pandas as pd
import numpy as np
# from sklearn.model_selection import train_test_split 
# why import this if you want to do tran/test manually? 
from pandas import datetime

# Defining Sample data
df=pd.DataFrame({'date':['2019-01-01','2019-01-02','2019-01-03',
                         '2019-01-04','2019-01-05','2019-01-06',
                         '2019-01-07','2019-01-08','2019-01-09',
                         '2019-01-10','2019-01-11','2019-01-12'],
                  'price':[78,60,62,64,66,68,70,72,74,76,78,80],
                 'factor1':[178,287,152,294,155,245,168,276,165,275,178,221]
                })
# Changing index to datetime
df['date'] = pd.to_datetime(df['date'], errors='ignore', format='%Y%m%d')
select_dates = df.set_index(['date'])

df = df.set_index('date')
df.index = pd.to_datetime(df.index)
df.sort_index(inplace=True)
df.dropna(inplace=True)

# Splitting Data into test and training sets manually
train=df.loc['2019-01-01':'2019-01-09']
# I made a change here #CHANGED 10 to 09 so one more month got added
# that means my input array is now 4,1 (if you add a column array is - ) 
# (4,2) 
# I can give any step from -4,0,4 (integral)

test=df.loc['2019-01-09':'2019-01-12']

# setting index to datetime for test and train datasets
train.index = pd.DatetimeIndex(train.index).to_period('D')
test.index = pd.DatetimeIndex(test.index).to_period('D')

# Defining and fitting the model with training data for endogenous and exogenous data
import statsmodels.api as sm

model=sm.tsa.statespace.SARIMAX(train['price'],
                                order=(0, 0, 0),
                                seasonal_order=(0, 0, 0,12), 
                                exog=train.iloc[:,1:],
                                time_varying_regression=True,
                                mle_regression=False)
model_1= model.fit(disp=False)

# Defining exogenous data for testing 
exog_test=test.iloc[:,1:]

# Forcasting out of sample data with exogenous data
forecast = model_1.forecast(4, exog=exog_test)

I would attempt to answer this question as it mainly relates to the type of data and documentation about statsmodels package.

As per the documentation the 'steps' are an integer, the number of steps to forecast from the end of the sample. That also means if you are interested in getting more than three steps you need to provide additional array data for training and TESTING data (note - both).
(https://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html)
(https://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAXResults.forecast.html)

Here are two errors I get when I increase step size by one:

ValueError: cannot reshape array of size 3 into shape (4,1)
Provided exogenous values are not of the appropriate shape. Required (4, 1), got (3, 1).

ValueError: the number of rows in the exogenous variable does not match the number of time periods you're asking it to predict

With that said simply expanding the testing set works well and gets you additional forecasts here is the code that works and the working notebook link:

https://colab.research.google.com/drive/1o9KXAe61EKH6bDI-FJO3qXzlWjz9IHHw?usp=sharing

import pandas as pd
import numpy as np
# from sklearn.model_selection import train_test_split 
# why import this if you want to do tran/test manually? 
from pandas import datetime

# Defining Sample data
df=pd.DataFrame({'date':['2019-01-01','2019-01-02','2019-01-03',
                         '2019-01-04','2019-01-05','2019-01-06',
                         '2019-01-07','2019-01-08','2019-01-09',
                         '2019-01-10','2019-01-11','2019-01-12'],
                  'price':[78,60,62,64,66,68,70,72,74,76,78,80],
                 'factor1':[178,287,152,294,155,245,168,276,165,275,178,221]
                })
# Changing index to datetime
df['date'] = pd.to_datetime(df['date'], errors='ignore', format='%Y%m%d')
select_dates = df.set_index(['date'])

df = df.set_index('date')
df.index = pd.to_datetime(df.index)
df.sort_index(inplace=True)
df.dropna(inplace=True)

# Splitting Data into test and training sets manually
train=df.loc['2019-01-01':'2019-01-09']
# I made a change here #CHANGED 10 to 09 so one more month got added
# that means my input array is now 4,1 (if you add a column array is - ) 
# (4,2) 
# I can give any step from -4,0,4 (integral)

test=df.loc['2019-01-09':'2019-01-12']

# setting index to datetime for test and train datasets
train.index = pd.DatetimeIndex(train.index).to_period('D')
test.index = pd.DatetimeIndex(test.index).to_period('D')

# Defining and fitting the model with training data for endogenous and exogenous data
import statsmodels.api as sm

model=sm.tsa.statespace.SARIMAX(train['price'],
                                order=(0, 0, 0),
                                seasonal_order=(0, 0, 0,12), 
                                exog=train.iloc[:,1:],
                                time_varying_regression=True,
                                mle_regression=False)
model_1= model.fit(disp=False)

# Defining exogenous data for testing 
exog_test=test.iloc[:,1:]

# Forcasting out of sample data with exogenous data
forecast = model_1.forecast(4, exog=exog_test)

回复收藏 0 原文

~没有更多了~

关于作者

弄潮

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

Sarimax从样品预测中使用外源数据

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

浪子阿飞

JK.Yang

人间不值得

静待花开

只涨不跌

污浊的双黑

友情链接

Sarimax从样品预测中使用外源数据

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

浪子阿飞

JK.Yang

人间不值得

静待花开

只涨不跌

污浊的双黑

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。