使用时间序列交叉验证模仿 createFolds
R 包插入符号提供了一个方便的函数 createFolds,它返回要在交叉验证中使用的训练集的索引列表:
set.seed(1)
require(caret)
x <- rnorm(10)
createFolds(x,k=5,returnTrain=TRUE)
$Fold1
[1] 1 2 5 6 7 8 9 10
$Fold2
[1] 1 3 4 5 6 8 9 10
$Fold3
[1] 1 2 3 4 5 7 8 10
$Fold4
[1] 1 2 3 4 6 7 8 9
$Fold5
[1] 2 3 4 5 6 7 9 10
我想创建一个类似的函数,但我想返回要在 时间序列交叉验证。我发现了一些R 中的示例代码,但我想对事情进行更多的概括和功能化。这是我最初想到的:
createTSfolds <- function(y, Min=max(frequency(y),3)) {
i <- seq(along=y)
stops <- i[Min:(length(i)-1)]
starts <- rep(1,length(stops))
out <- mapply(seq,starts,stops)
names(out) <- paste("Fold", gsub(" ", "0", format(seq(along = out))), sep = "")
out
}
createTSfolds(x)
$Fold1
[1] 1 2 3
$Fold2
[1] 1 2 3 4
$Fold3
[1] 1 2 3 4 5
$Fold4
[1] 1 2 3 4 5 6
$Fold5
[1] 1 2 3 4 5 6 7
$Fold6
[1] 1 2 3 4 5 6 7 8
$Fold7
[1] 1 2 3 4 5 6 7 8 9
(Min 是拟合模型所需的最小观察次数)
这个函数目前工作得很好,但我想添加 2 个函数 Rob Hyndman 讨论:
- 窗口化:而不是将训练集延伸回第一个 观察,它延伸回 n 个观察。
- 可变预测范围:不是每次折叠都向训练集添加 1 个索引,而是每次折叠向训练集添加 k。
这是我实现窗口的方法:
createTSfolds <- function(y, Min=max(frequency(y),3), lookback=NA) {
i <- seq(along=y)
stops <- i[Min:(length(i)-1)]
if (is.na(lookback)) {
starts <- as.list(rep(1,length(stops)))
out <- mapply(seq,starts,stops)
} else {
starts <- stops-Min+1
out <- mapply(seq,starts,stops)
out <- split(t(out),1:nrow(t(out)))
}
names(out) <- paste("Fold", gsub(" ", "0", format(seq(along = out))), sep = "")
out
}
createTSfolds(x,Min=4,lookback=4)
我不知道如何实现可变预测范围,它看起来像这样: 例如,如果 k=3:
$Fold1
[1] 1 2 3
$Fold2
[1] 1 2 3 4 5 6
$Fold3
[1] 1 2 3 4 5 6 7 8 9
我正在寻找改进现有代码的方法,以及向每次折叠的训练集添加变量增量的方法。
谢谢
The R package caret provides a handy function createFolds, which returns a list of indexes for training sets to be used in cross-validation:
set.seed(1)
require(caret)
x <- rnorm(10)
createFolds(x,k=5,returnTrain=TRUE)
$Fold1
[1] 1 2 5 6 7 8 9 10
$Fold2
[1] 1 3 4 5 6 8 9 10
$Fold3
[1] 1 2 3 4 5 7 8 10
$Fold4
[1] 1 2 3 4 6 7 8 9
$Fold5
[1] 2 3 4 5 6 7 9 10
I would like to create a similar function, except I want to return a list of indexes to be used in time-series cross validation. I found some example code in R, but I want to generalize and functionalize things more. Here's what I initially came up with:
createTSfolds <- function(y, Min=max(frequency(y),3)) {
i <- seq(along=y)
stops <- i[Min:(length(i)-1)]
starts <- rep(1,length(stops))
out <- mapply(seq,starts,stops)
names(out) <- paste("Fold", gsub(" ", "0", format(seq(along = out))), sep = "")
out
}
createTSfolds(x)
$Fold1
[1] 1 2 3
$Fold2
[1] 1 2 3 4
$Fold3
[1] 1 2 3 4 5
$Fold4
[1] 1 2 3 4 5 6
$Fold5
[1] 1 2 3 4 5 6 7
$Fold6
[1] 1 2 3 4 5 6 7 8
$Fold7
[1] 1 2 3 4 5 6 7 8 9
(Min is the minimum number of observation needed to fit a model)
This function works pretty well for now, but I'd like to add 2 functions that Rob Hyndman discusses:
- Windowing: Instead of the training set extending back to the 1st
observation, it extends back n observations. - Variable forecast horizons: Instead adding 1 index to the training set each fold, add k to the training set each fold.
Here is how I implemented windowing:
createTSfolds <- function(y, Min=max(frequency(y),3), lookback=NA) {
i <- seq(along=y)
stops <- i[Min:(length(i)-1)]
if (is.na(lookback)) {
starts <- as.list(rep(1,length(stops)))
out <- mapply(seq,starts,stops)
} else {
starts <- stops-Min+1
out <- mapply(seq,starts,stops)
out <- split(t(out),1:nrow(t(out)))
}
names(out) <- paste("Fold", gsub(" ", "0", format(seq(along = out))), sep = "")
out
}
createTSfolds(x,Min=4,lookback=4)
I can't figure out how to implement variable forecast horizons, which would look like this:
For example if k=3:
$Fold1
[1] 1 2 3
$Fold2
[1] 1 2 3 4 5 6
$Fold3
[1] 1 2 3 4 5 6 7 8 9
I'm looking for ways to improve my existing code, as well as ways to add variable increments to the training set each fold.
Thank you
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这是一种方法。它并不完全可靠,因为我不确定当
lookback
和k
都存在时您寻求的输出。让我知道这是否是您正在寻找的。Here is one approach. It is not entirely robust, as I am not sure about the output you seek when both
lookback
andk
are present. Let me know if this is what you were looking for.