解释 ARIMA 模型的预测

发布于 2024-08-29 21:50:43 字数 973 浏览 12 评论 0原文

我试图向自己解释将 ARIMA 模型应用于时间序列数据集的预测结果。数据来自M1-Competition，系列号为MNB65。我正在尝试将数据拟合到 ARIMA(1,0,0) 模型并获得预测。我正在使用 R。以下是一些输出片段：

> arima(x, order = c(1,0,0))
Series: x 
ARIMA(1,0,0) with non-zero mean 
Call: arima(x = x, order = c(1, 0, 0)) 
Coefficients:
         ar1  intercept
      0.9421  12260.298
s.e.  0.0474    202.717

> predict(arima(x, order = c(1,0,0)), n.ahead=12)
$pred
Time Series:
Start = 53 
End = 64 
Frequency = 1 
[1] 11757.39 11786.50 11813.92 11839.75 11864.09 11887.02 11908.62 11928.97 11948.15 11966.21 11983.23 11999.27

我有几个问题：

（1）如何解释虽然数据集显示出明显的下降趋势，但该模型的预测却呈上升趋势？ ARIMA(2,0,0) 也会发生这种情况，它是使用 auto.arima（预测包）和 ARIMA(1,0,1) 模型的数据的最佳 ARIMA 拟合。

(2) ARIMA(1,0,0) 模型的截距值为 12260.298。截距不应该满足等式：C = Mean * (1 - sum(AR coeffs))，在这种情况下，该值应为 715.52。我一定在这里遗漏了一些基本的东西。

(3) 这显然是一个具有非平稳均值的序列。为什么 AR(2) 模型仍然被 auto.arima 选为最佳模型？能有一个直观的解释吗？

谢谢。

原文

I am trying to explain to myself the forecasting result from applying an ARIMA model to a time-series dataset. The data is from the M1-Competition, the series is MNB65. I am trying to fit the data to an ARIMA(1,0,0) model and get the forecasts. I am using R. Here are some output snippets:

> arima(x, order = c(1,0,0))
Series: x 
ARIMA(1,0,0) with non-zero mean 
Call: arima(x = x, order = c(1, 0, 0)) 
Coefficients:
         ar1  intercept
      0.9421  12260.298
s.e.  0.0474    202.717

> predict(arima(x, order = c(1,0,0)), n.ahead=12)
$pred
Time Series:
Start = 53 
End = 64 
Frequency = 1 
[1] 11757.39 11786.50 11813.92 11839.75 11864.09 11887.02 11908.62 11928.97 11948.15 11966.21 11983.23 11999.27

I have a few questions:

(1) How do I explain that although the dataset shows a clear downward trend, the forecast from this model trends upward? This also happens for ARIMA(2,0,0), which is the best ARIMA fit for the data using auto.arima (forecast package) and for an ARIMA(1,0,1) model.

(2) The intercept value for the ARIMA(1,0,0) model is 12260.298. Shouldn't the intercept satisfy the equation: C = mean * (1 - sum(AR coeffs)), in which case, the value should be 715.52. I must be missing something basic here.

(3) This is clearly a series with non-stationary mean. Why is an AR(2) model still selected as the best model by auto.arima? Could there be an intuitive explanation?

Thanks.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

懒猫 2024-09-05 21:50:49

ARIMA(p,0,q) 模型不会考虑趋势，因为该模型是平稳的。如果您确实想要包含趋势，请使用带有漂移项的 ARIMA(p,1,q) 或 ARIMA(p,2,q)。 auto.arima() 建议 0 差异的事实通常表明没有明显的趋势。
arima() 的帮助文件显示截距实际上是平均值。也就是说，AR(1) 模型是 (Y_t-c) = phi(Y_{t-1} - c) + e_t 而不是 Y_t = c + phiY_{t-1 } + e_t 如您所料。
auto.arima() 使用单位根测试来确定所需的差异数。因此，请检查单位根检验的结果，看看发生了什么。如果您认为单位根测试不会产生合理的模型，您始终可以在 auto.arima() 中指定所需的差异数。

以下是对数据进行两次检验的结果：

R> adf.test(x)

        Augmented Dickey-Fuller Test

data:  x 
Dickey-Fuller = -1.031, Lag order = 3, p-value = 0.9249
alternative hypothesis: stationary 

R> kpss.test(x)

        KPSS Test for Level Stationarity

data:  x 
KPSS Level = 0.3491, Truncation lag parameter = 1, p-value = 0.09909

因此，ADF 表示强烈非平稳（在这种情况下为原假设），而 KPSS 并不完全拒绝平稳性（该检验的原假设）。 auto.arima() 默认使用后者。如果您想要第一次测试，可以使用 auto.arima(x,test="adf") 。在这种情况下，它建议模型 ARIMA(0,2,1) 确实有趋势。

No ARIMA(p,0,q) model will allow for a trend because the model is stationary. If you really want to include a trend, use ARIMA(p,1,q) with a drift term, or ARIMA(p,2,q). The fact that auto.arima() is suggesting 0 differences would usually indicate there is no clear trend.
The help file for arima() shows that the intercept is actually the mean. That is, the AR(1) model is (Y_t-c) = ϕ(Y_{t-1} - c) + e_t rather than Y_t = c + ϕY_{t-1} + e_t as you might expect.
auto.arima() uses a unit root test to determine the number of differences required. So check the results from the unit root test to see what's going on. You can always specify the required number of differences in auto.arima() if you think the unit root tests are not leading to a sensible model.

Here are the results from two tests for your data:

R> adf.test(x)

        Augmented Dickey-Fuller Test

data:  x 
Dickey-Fuller = -1.031, Lag order = 3, p-value = 0.9249
alternative hypothesis: stationary 

R> kpss.test(x)

        KPSS Test for Level Stationarity

data:  x 
KPSS Level = 0.3491, Truncation lag parameter = 1, p-value = 0.09909

So the ADF says strongly non-stationary (the null hypothesis in that case) while the KPSS doesn't quite reject stationarity (the null hypothesis for that test). auto.arima() uses the latter by default. You could use auto.arima(x,test="adf") if you wanted the first test. In that case, it suggests the model ARIMA(0,2,1) which does have a trend.

回复收藏 0 原文

~没有更多了~