不规则数据的时间序列建模
我目前正在开展一个宠物项目,根据历史基础油价格预测未来基础油价格。数据是每周的,但中间有一些时期价格缺失。
我对完整数据的时间序列建模还算可以,但是当涉及到不规则的时间序列时,我学到的模型可能不适用。我是否使用 xts 类并以通常的方式继续处理 R 中的 ARIMA 模型?
在建立了预测未来价格的模型后,我想考虑原油价格波动、柴油利润率、汽车销量、经济增长等(多变量?)以提高准确性。有人可以告诉我如何以有效的方式做到这一点吗?在我看来,它就像一个迷宫。
编辑:此处修剪数据:https://docs.google.com/document/d/18pt4ulTpaVWQhVKn9XJHhQjvKwNI9uQystLL4WYinrY/edit
编码:
Mod.fit<-arima(Y,order =c(3,2,6), method ="ML")
结果: 警告消息: 在 log(s2) 中:产生 NaN
此警告会影响我的模型准确性吗?
由于缺少数据,我无法使用 ACF 和 PACF。有没有更好的选择模型的方法?我使用 AIC(赤池信息准则)来比较使用此代码的不同 ARIMA 模型。ARIMA(3,2,6) 给出了最小的 AIC。
编码:
AIC<-matrix(0,6,6)
for(p in 0:5)
for(q in 0:5)
{
mod.fit<-arima(Y,order=c(p,2,q))
AIC[p+1,q+1]<-mod.fit$aic
p
}
AIC
结果:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1396.913 1328.481 1327.896 1328.350 1326.057 1325.063
[2,] 1343.925 1326.862 1328.321 1328.644 1325.239 1318.282
[3,] 1334.642 1328.013 1330.005 1327.304 1326.882 1314.239
[4,] 1336.393 1329.954 1324.114 1322.136 1323.567 1316.150
[5,] 1319.137 1321.030 1320.575 1321.287 1323.750 1316.815
[6,] 1321.135 1322.634 1320.115 1323.670 1325.649 1318.015
I'm currently working on a pet project to forecast future base oil prices from historical base oil prices. The data is weekly but there are some periods in between where prices are missing.
I'm somewhat okay with modelling time series with complete data but when it comes to irregular ones, the models that I've learnt may not be applicable. Do I use xts class and proceed with ARIMA models in R the usual way?
After building a model to predict future prices, I'd like to factor in crude oil price fluctuation, diesel profit margin, car sales, economic growth and so on(Multivariable?) to improve accuracy. Can someone shed some light on how do I go about doing this the efficient way? In my mind, it looks like a maze.
EDIT: Trimmed Data here: https://docs.google.com/document/d/18pt4ulTpaVWQhVKn9XJHhQjvKwNI9uQystLL4WYinrY/edit
Coding:
Mod.fit<-arima(Y,order =c(3,2,6), method ="ML")
Result:
Warning message:
In log(s2) : NaNs produced
Will this warning affect my model accuracy?
With missing data, I can't use ACF and PACF. Is there a better way to select models? I used AIC(Akaike's Information Criterion) to compare different ARIMA models using this code.ARIMA(3,2,6) gave the smallest AIC.
Coding:
AIC<-matrix(0,6,6)
for(p in 0:5)
for(q in 0:5)
{
mod.fit<-arima(Y,order=c(p,2,q))
AIC[p+1,q+1]<-mod.fit$aic
p
}
AIC
Result:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1396.913 1328.481 1327.896 1328.350 1326.057 1325.063
[2,] 1343.925 1326.862 1328.321 1328.644 1325.239 1318.282
[3,] 1334.642 1328.013 1330.005 1327.304 1326.882 1314.239
[4,] 1336.393 1329.954 1324.114 1322.136 1323.567 1316.150
[5,] 1319.137 1321.030 1320.575 1321.287 1323.750 1316.815
[6,] 1321.135 1322.634 1320.115 1323.670 1325.649 1318.015
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
不,一般来说,您不需要使用 xts 然后执行 ARIMA,需要一个额外的步骤。记录为
NA
的缺失值由arima()
处理,如果使用method = "ML"
那么它们将被准确处理;其他方法可能无法获得针对缺失数据的创新。这是可行的,因为 arima() 适合状态空间表示中的 ARIMA 模型。如果数据是规则的但有缺失数据那么上面应该没问题。
我之所以说一般不要使用 xts ,是因为
arima()
需要单变量时间序列对象?ts
作为其输入。但是,xts 扩展并继承了 zoo 对象,并且 zoo 包确实提供了as.ts
方法。因此,如果您将数据放入“zoo”
类对象的 ()zoo()
或xts()
对象中,则可以强制转换为"ts"
类,这应该将NA
包含在适当的位置,arima()
将在可以的情况下进行处理(即没有太多缺失值)。No in general you don't need to use xts and then do an ARIMA, there is an extra step required. Missing values, recorded as
NA
are handled byarima()
and if usingmethod = "ML"
then they will be handled exactly; other methods may not get the innovations for missing data. This works becausearima()
fits the ARIMA model in a state-space representation.If the data is regular but has missing data then the above should be fine.
The reason I say don't in general use xts is just that
arima()
requires a univariate time series object?ts
as its input. However, xts extends and inherits from zoo objects and the zoo package does provide anas.ts()
method for objects of class"zoo"
. So if you get your data into azoo()
orxts()
object, you can then coerce to class"ts"
and that should include theNA
in the appropriate places, whicharima()
will then handle if it can (i.e. there aren't too many missing values).