不规则数据的时间序列建模

发布于 2024-12-17 07:17:16 字数 1410 浏览 4 评论 0原文

我目前正在开展一个宠物项目，根据历史基础油价格预测未来基础油价格。数据是每周的，但中间有一些时期价格缺失。

我对完整数据的时间序列建模还算可以，但是当涉及到不规则的时间序列时，我学到的模型可能不适用。我是否使用 xts 类并以通常的方式继续处理 R 中的 ARIMA 模型？

在建立了预测未来价格的模型后，我想考虑原油价格波动、柴油利润率、汽车销量、经济增长等（多变量？）以提高准确性。有人可以告诉我如何以有效的方式做到这一点吗？在我看来，它就像一个迷宫。

编辑：此处修剪数据：https://docs.google.com/document/d/18pt4ulTpaVWQhVKn9XJHhQjvKwNI9uQystLL4WYinrY/edit

编码：

Mod.fit<-arima(Y,order =c(3,2,6), method ="ML")

结果：警告消息：在 log(s2) 中：产生 NaN

此警告会影响我的模型准确性吗？

由于缺少数据，我无法使用 ACF 和 PACF。有没有更好的选择模型的方法？我使用 AIC（赤池信息准则）来比较使用此代码的不同 ARIMA 模型。ARIMA(3,2,6) 给出了最小的 AIC。

编码：

AIC<-matrix(0,6,6)
for(p in 0:5)
for(q in 0:5)
{
mod.fit<-arima(Y,order=c(p,2,q))
AIC[p+1,q+1]<-mod.fit$aic
p
}
AIC

结果：

              [,1]     [,2]     [,3]     [,4]     [,5]     [,6] 
    [1,] 1396.913 1328.481 1327.896 1328.350 1326.057 1325.063 
    [2,] 1343.925 1326.862 1328.321 1328.644 1325.239 1318.282 
    [3,] 1334.642 1328.013 1330.005 1327.304 1326.882 1314.239 
    [4,] 1336.393 1329.954 1324.114 1322.136 1323.567 1316.150 
    [5,] 1319.137 1321.030 1320.575 1321.287 1323.750 1316.815 
    [6,] 1321.135 1322.634 1320.115 1323.670 1325.649 1318.015

原文

I'm currently working on a pet project to forecast future base oil prices from historical base oil prices. The data is weekly but there are some periods in between where prices are missing.

I'm somewhat okay with modelling time series with complete data but when it comes to irregular ones, the models that I've learnt may not be applicable. Do I use xts class and proceed with ARIMA models in R the usual way?

After building a model to predict future prices, I'd like to factor in crude oil price fluctuation, diesel profit margin, car sales, economic growth and so on(Multivariable?) to improve accuracy. Can someone shed some light on how do I go about doing this the efficient way? In my mind, it looks like a maze.

EDIT: Trimmed Data here: https://docs.google.com/document/d/18pt4ulTpaVWQhVKn9XJHhQjvKwNI9uQystLL4WYinrY/edit

Coding:

Mod.fit<-arima(Y,order =c(3,2,6), method ="ML")

Result:
Warning message:
In log(s2) : NaNs produced

Will this warning affect my model accuracy?

With missing data, I can't use ACF and PACF. Is there a better way to select models? I used AIC(Akaike's Information Criterion) to compare different ARIMA models using this code.ARIMA(3,2,6) gave the smallest AIC.

Coding:

AIC<-matrix(0,6,6)
for(p in 0:5)
for(q in 0:5)
{
mod.fit<-arima(Y,order=c(p,2,q))
AIC[p+1,q+1]<-mod.fit$aic
p
}
AIC

Result:

              [,1]     [,2]     [,3]     [,4]     [,5]     [,6] 
    [1,] 1396.913 1328.481 1327.896 1328.350 1326.057 1325.063 
    [2,] 1343.925 1326.862 1328.321 1328.644 1325.239 1318.282 
    [3,] 1334.642 1328.013 1330.005 1327.304 1326.882 1314.239 
    [4,] 1336.393 1329.954 1324.114 1322.136 1323.567 1316.150 
    [5,] 1319.137 1321.030 1320.575 1321.287 1323.750 1316.815 
    [6,] 1321.135 1322.634 1320.115 1323.670 1325.649 1318.015

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

幸福％小乖 2024-12-24 07:17:16

不，一般来说，您不需要使用 xts 然后执行 ARIMA，需要一个额外的步骤。记录为 NA 的缺失值由 arima() 处理，如果使用 method = "ML" 那么它们将被准确处理；其他方法可能无法获得针对缺失数据的创新。这是可行的，因为 arima() 适合状态空间表示中的 ARIMA 模型。

如果数据是规则的但有缺失数据那么上面应该没问题。

我之所以说一般不要使用 xts ，是因为 arima() 需要单变量时间序列对象 ?ts 作为其输入。但是，xts 扩展并继承了 zoo 对象，并且 zoo 包确实提供了 as.ts “zoo” 类对象的 () 方法。因此，如果您将数据放入 zoo() 或 xts() 对象中，则可以强制转换为 "ts" 类，这应该将 NA 包含在适当的位置，arima() 将在可以的情况下进行处理（即没有太多缺失值）。