使用季节性周期对时间序列中的缺失值进行插值

发布于 2024-10-16 19:03:49 字数 1418 浏览 1 评论 0原文

我有一个时间序列,我想智能地插入缺失值。特定时间的价值受到多日趋势及其在每日周期中的位置的影响。

这是一个示例,其中 myzoo 中缺少第十个观察值。

start <- as.POSIXct("2010-01-01") 
freq <- as.difftime(6, units = "hours") 
dayvals <- (1:4)*10 
timevals <- c(3, 1, 2, 4) 
index <- seq(from = start, by = freq, length.out = 16)
obs <- (rep(dayvals, each = 4) + rep(timevals, times = 4))
myzoo <- zoo(obs, index)
myzoo[10] <- NA

如果我必须实现此功能,我会使用附近日期的关闭时间的某种加权平均值,或者添加该天的值到适合更大趋势的函数线,但我希望已经存在一些适用于这种情况的包或函数?

编辑:稍微修改代码以澄清我的问题。有一些 na.* 方法可以从最近的邻居进行插值,但在这种情况下,它们无法识别缺失值是在当天最低值的时间。也许解决方案是将数据重塑为宽格式,然后进行插值,但我不想完全忽略同一天的连续值。值得注意的是,diff(myzoo, lag = 4) 返回一个 10 的向量。解决方案可能在于 reshapena.splinediff.inv 的某种组合,但我就是无法弄清楚。

以下是三种不起作用的方法: 在此处输入图像描述

编辑2。使用以下代码生成的图像。

myzoo <- zoo(obs, index)
myzoo[10] <- NA # knock out the missing point
plot(myzoo, type="o", pch=16) # plot solid line
points(na.approx(myzoo)[10], col = "red")
points(na.locf(myzoo)[10], col = "blue")
points(na.spline(myzoo)[10], col = "green")
myzoo[10] <- 31 # replace the missing point
lines(myzoo, type = "o", lty=3, pch=16) # dashed line over the gap
legend(x = "topleft", 
       legend = c("na.spline", "na.locf", "na.approx"), 
       col=c("green","blue","red"), pch = 1)

I have a time series for which I want to intelligently interpolate the missing values. The value at a particular time is influenced by a multi-day trend, as well as its position in the daily cycle.

Here is an example in which the tenth observation is missing from myzoo

start <- as.POSIXct("2010-01-01") 
freq <- as.difftime(6, units = "hours") 
dayvals <- (1:4)*10 
timevals <- c(3, 1, 2, 4) 
index <- seq(from = start, by = freq, length.out = 16)
obs <- (rep(dayvals, each = 4) + rep(timevals, times = 4))
myzoo <- zoo(obs, index)
myzoo[10] <- NA

If I had to implement this, I'd use some kind of weighted mean of close times on nearby days, or add a value for the day to a function line fitted to the larger trend, but I hope there already exist some package or functions that apply to this situation?

EDIT: Modified the code slightly to clarify my problem. There are na.* methods that interpolate from nearest neighbors, but in this case they do not recognize that the missing value is at the time that is the lowest value of the day. Maybe the solution is to reshape the data to wide format and then interpolate, but I wouldn't like to completely disregard the contiguous values from the same day. It is worth noting that diff(myzoo, lag = 4) returns a vector of 10's. The solution may lie with some combination of reshape, na.spline, and diff.inv, but I just can't figure it out.

Here are three approaches that don't work:
enter image description here

EDIT2. Image produced using the following code.

myzoo <- zoo(obs, index)
myzoo[10] <- NA # knock out the missing point
plot(myzoo, type="o", pch=16) # plot solid line
points(na.approx(myzoo)[10], col = "red")
points(na.locf(myzoo)[10], col = "blue")
points(na.spline(myzoo)[10], col = "green")
myzoo[10] <- 31 # replace the missing point
lines(myzoo, type = "o", lty=3, pch=16) # dashed line over the gap
legend(x = "topleft", 
       legend = c("na.spline", "na.locf", "na.approx"), 
       col=c("green","blue","red"), pch = 1)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

梦回梦里 2024-10-23 19:03:49

试试这个:

x <- ts(myzoo,f=4)
fit <- ts(rowSums(tsSmooth(StructTS(x))[,-2]))
tsp(fit) <- tsp(x)
plot(x)
lines(fit,col=2)

这个想法是使用时间序列的基本结构模型,该模型使用卡尔曼滤波器很好地处理缺失值。然后使用卡尔曼平滑来估计时间序列中的每个点,包括任何遗漏的点。

我必须将你的 Zoo 对象转换为频率为 4 的 ts 对象才能使用 StructTS。您可能想再次将拟合值更改回动物园。

Try this:

x <- ts(myzoo,f=4)
fit <- ts(rowSums(tsSmooth(StructTS(x))[,-2]))
tsp(fit) <- tsp(x)
plot(x)
lines(fit,col=2)

The idea is to use a basic structural model for the time series, which handles the missing value fine using a Kalman filter. Then a Kalman smooth is used to estimate each point in the time series, including any omitted.

I had to convert your zoo object to a ts object with frequency 4 in order to use StructTS. You may want to change the fitted values back to zoo again.

童话 2024-10-23 19:03:49

在这种情况下,我认为您需要在 ARIMA 模型中进行季节性修正。这里没有足够的日期来适应季节性模型,但这应该可以帮助您开始。

library(zoo)
start <- as.POSIXct("2010-01-01") 
freq <- as.difftime(6, units = "hours") 
dayvals <- (1:4)*10 
timevals <- c(3, 1, 2, 4) 
index <- seq(from = start, by = freq, length.out = 16)
obs <- (rep(dayvals, each = 4) + rep(timevals, times = 4))
myzoo <- myzoo.orig <- zoo(obs, index)
myzoo[10] <- NA

myzoo.fixed <- na.locf(myzoo)

myarima.resid <- arima(myzoo.fixed, order = c(3, 0, 3), seasonal = list(order = c(0, 0, 0), period = 4))$residuals
myzoo.reallyfixed <- myzoo.fixed
myzoo.reallyfixed[10] <- myzoo.fixed[10] + myarima.resid[10]

plot(myzoo.reallyfixed)
points(myzoo.orig)

在我的测试中,ARMA(3, 3) 非常接近,但这只是运气。对于较长的时间序列,您应该能够校准季节性校正,以便为您提供良好的预测。事先了解信号和季节性校正的潜在机制将有助于获得更好的样本性能。

In this case, I think you want a seasonality correction in the ARIMA model. There's not enough date here to fit the seasonal model, but this should get you started.

library(zoo)
start <- as.POSIXct("2010-01-01") 
freq <- as.difftime(6, units = "hours") 
dayvals <- (1:4)*10 
timevals <- c(3, 1, 2, 4) 
index <- seq(from = start, by = freq, length.out = 16)
obs <- (rep(dayvals, each = 4) + rep(timevals, times = 4))
myzoo <- myzoo.orig <- zoo(obs, index)
myzoo[10] <- NA

myzoo.fixed <- na.locf(myzoo)

myarima.resid <- arima(myzoo.fixed, order = c(3, 0, 3), seasonal = list(order = c(0, 0, 0), period = 4))$residuals
myzoo.reallyfixed <- myzoo.fixed
myzoo.reallyfixed[10] <- myzoo.fixed[10] + myarima.resid[10]

plot(myzoo.reallyfixed)
points(myzoo.orig)

In my tests the ARMA(3, 3) is really close, but that's just luck. With a longer time series you should be able to calibrate the seasonal correction to give you good predictions. It would be helpful to have a good prior on what the underlying mechanisms for both the signal and the seasonal correction to get better out of sample performance.

纵情客 2024-10-23 19:03:49

forecast::na.interp 是一个很好的方法。来自文档

对非季节性序列使用线性插值,并对季节性序列使用周期性 stl 分解来替换缺失值。

library(forecast)
fit <- na.interp(myzoo)
fit[10]  # 32.5, vs. 31.0 actual and 32.0 from Rob Hyndman's answer

本文针对实时序列评估了几种插值方法,并发现na.interp< /code> 既准确又高效:

从本文测试的 R 实现来看,预测包中的 na.interp 和 Zoo 包中的 na.StructTS 显示了最佳的总体结果。

na.interp 函数也不比
na.approx[最快的方法],所以黄土分解对计算时间的要求似乎不是很高。

另外值得注意的是,Rob Hyndman 编写了 forecast 包,并在提供此问题的答案后包含了 na.interpna.interp 很可能是对此方法的改进,尽管它在本例中表现较差(可能是由于在 StructTS 中指定了周期,其中 na .interp 计算出来)。

forecast::na.interp is a good approach. From the documentation

Uses linear interpolation for non-seasonal series and a periodic stl decomposition with seasonal series to replace missing values.

library(forecast)
fit <- na.interp(myzoo)
fit[10]  # 32.5, vs. 31.0 actual and 32.0 from Rob Hyndman's answer

This paper evaluates several interpolation methods against real time series, and finds that na.interp is both accurate and efficient:

From the R implementations tested in this paper, na.interp from the forecast package and na.StructTS from the zoo package showed the best overall results.

The na.interp function is also not that much slower than
na.approx [the fastest method], so the loess decomposition seems not to be very demanding in terms of computing time.

Also worth noting that Rob Hyndman wrote the forecast package, and included na.interp after providing his answer to this question. It's likely that na.interp is an improvement upon this approach, even though it performed worse in this instance (probably due to specifying the period in StructTS, where na.interp figures it out).

人海汹涌 2024-10-23 19:03:49

imputeTS 包有一种对 ARIMA 模型的状态空间表示进行卡尔曼平滑的方法 - 这可能是解决此问题的一个好方法。

library(imputeTS)
na_kalman(myzoo, model = "auto.arima")

也可以直接与动物园时间序列对象一起使用。您还可以在此函数中使用您自己的 ARIMA 模型。如果您认为您可以做得更好,那么“auto.arima”。可以这样完成:

library(imputeTS)
usermodel <- arima(myts, order = c(1, 0, 1))$model
na_kalman(myts, model = usermodel)

但在这种情况下,您必须将 Zoo 对象转换回 ts,因为 arima() 只接受 ts。

Package imputeTS has a method for Kalman Smoothing on the state space representation of an ARIMA model - which might be a good solution for this problem.

library(imputeTS)
na_kalman(myzoo, model = "auto.arima")

Also works directly with zoo time series objects. You could also use your own ARIMA models in this function. If you think you can do better then "auto.arima". This would be done this way:

library(imputeTS)
usermodel <- arima(myts, order = c(1, 0, 1))$model
na_kalman(myts, model = usermodel)

But in this case you have to convert the zoo onject back to ts, since arima() only accepts ts.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文