使用时间序列交叉验证模仿 createFolds

发布于 2024-12-08 06:00:20 字数 2226 浏览 1 评论 0原文

R 包插入符号提供了一个方便的函数 createFolds,它返回要在交叉验证中使用的训练集的索引列表:

set.seed(1)
require(caret)
x <- rnorm(10)
createFolds(x,k=5,returnTrain=TRUE)

$Fold1
[1]  1  2  5  6  7  8  9 10

$Fold2
[1]  1  3  4  5  6  8  9 10

$Fold3
[1]  1  2  3  4  5  7  8 10

$Fold4
[1] 1 2 3 4 6 7 8 9

$Fold5
[1]  2  3  4  5  6  7  9 10

我想创建一个类似的函数,但我想返回要在 时间序列交叉验证。我发现了一些R 中的示例代码,但我想对事情进行更多的概括和功能化。这是我最初想到的:

createTSfolds <- function(y, Min=max(frequency(y),3)) {
    i <- seq(along=y)
    stops <- i[Min:(length(i)-1)]
    starts <- rep(1,length(stops))
    out <- mapply(seq,starts,stops)
    names(out) <- paste("Fold", gsub(" ", "0", format(seq(along = out))), sep = "")
    out
}
createTSfolds(x)

$Fold1
[1] 1 2 3

$Fold2
[1] 1 2 3 4

$Fold3
[1] 1 2 3 4 5

$Fold4
[1] 1 2 3 4 5 6

$Fold5
[1] 1 2 3 4 5 6 7

$Fold6
[1] 1 2 3 4 5 6 7 8

$Fold7
[1] 1 2 3 4 5 6 7 8 9

(Min 是拟合模型所需的最小观察次数)

这个函数目前工作得很好,但我想添加 2 个函数 Rob Hyndman 讨论

  1. 窗口化:而不是将训练集延伸回第一个 观察,它延伸回 n 个观察。
  2. 可变预测范围:不是每次折叠都向训练集添加 1 个索引,而是每次折叠向训练集添加 k。

这是我实现窗口的方法:

createTSfolds <- function(y, Min=max(frequency(y),3), lookback=NA) {
    i <- seq(along=y)
    stops <- i[Min:(length(i)-1)]
    if (is.na(lookback)) { 
        starts <- as.list(rep(1,length(stops)))
        out <- mapply(seq,starts,stops)
    } else {
        starts <- stops-Min+1
        out <- mapply(seq,starts,stops)
        out <- split(t(out),1:nrow(t(out)))
    }
    names(out) <- paste("Fold", gsub(" ", "0", format(seq(along = out))), sep = "")
    out
}
createTSfolds(x,Min=4,lookback=4)

我不知道如何实现可变预测范围,它看起来像这样: 例如,如果 k=3:

$Fold1
[1] 1 2 3

$Fold2
[1] 1 2 3 4 5 6

$Fold3
[1] 1 2 3 4 5 6 7 8 9

我正在寻找改进现有代码的方法,以及向每次折叠的训练集添加变量增量的方法。

谢谢

The R package caret provides a handy function createFolds, which returns a list of indexes for training sets to be used in cross-validation:

set.seed(1)
require(caret)
x <- rnorm(10)
createFolds(x,k=5,returnTrain=TRUE)

$Fold1
[1]  1  2  5  6  7  8  9 10

$Fold2
[1]  1  3  4  5  6  8  9 10

$Fold3
[1]  1  2  3  4  5  7  8 10

$Fold4
[1] 1 2 3 4 6 7 8 9

$Fold5
[1]  2  3  4  5  6  7  9 10

I would like to create a similar function, except I want to return a list of indexes to be used in time-series cross validation. I found some example code in R, but I want to generalize and functionalize things more. Here's what I initially came up with:

createTSfolds <- function(y, Min=max(frequency(y),3)) {
    i <- seq(along=y)
    stops <- i[Min:(length(i)-1)]
    starts <- rep(1,length(stops))
    out <- mapply(seq,starts,stops)
    names(out) <- paste("Fold", gsub(" ", "0", format(seq(along = out))), sep = "")
    out
}
createTSfolds(x)

$Fold1
[1] 1 2 3

$Fold2
[1] 1 2 3 4

$Fold3
[1] 1 2 3 4 5

$Fold4
[1] 1 2 3 4 5 6

$Fold5
[1] 1 2 3 4 5 6 7

$Fold6
[1] 1 2 3 4 5 6 7 8

$Fold7
[1] 1 2 3 4 5 6 7 8 9

(Min is the minimum number of observation needed to fit a model)

This function works pretty well for now, but I'd like to add 2 functions that Rob Hyndman discusses:

  1. Windowing: Instead of the training set extending back to the 1st
    observation, it extends back n observations.
  2. Variable forecast horizons: Instead adding 1 index to the training set each fold, add k to the training set each fold.

Here is how I implemented windowing:

createTSfolds <- function(y, Min=max(frequency(y),3), lookback=NA) {
    i <- seq(along=y)
    stops <- i[Min:(length(i)-1)]
    if (is.na(lookback)) { 
        starts <- as.list(rep(1,length(stops)))
        out <- mapply(seq,starts,stops)
    } else {
        starts <- stops-Min+1
        out <- mapply(seq,starts,stops)
        out <- split(t(out),1:nrow(t(out)))
    }
    names(out) <- paste("Fold", gsub(" ", "0", format(seq(along = out))), sep = "")
    out
}
createTSfolds(x,Min=4,lookback=4)

I can't figure out how to implement variable forecast horizons, which would look like this:
For example if k=3:

$Fold1
[1] 1 2 3

$Fold2
[1] 1 2 3 4 5 6

$Fold3
[1] 1 2 3 4 5 6 7 8 9

I'm looking for ways to improve my existing code, as well as ways to add variable increments to the training set each fold.

Thank you

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

痞味浪人 2024-12-15 06:00:20

这是一种方法。它并不完全可靠,因为我不确定当 lookbackk 都存在时您寻求的输出。让我知道这是否是您正在寻找的。

 createTSfolds2 <- function(y, Min = max(frequency(y), 3), lookback = NA, k = NA){
   out = llply(Min:(length(y) - 1), seq)
   if (!is.na(k)) {out = out[seq(1, length(out), k)]}
   if (!is.na(lookback)) {
     out = plyr::llply(out, function(z) z[(length(z) - lookback + 1):length(z)])
   }
   names(out) <- paste("Fold", gsub(" ", "0", format(seq(along = out))), sep = "")
   return(out)
 }

createTSfolds2(x, Min = 3, lookback = NA, k = 3)

$Fold1
[1] 1 2 3

$Fold2
[1] 1 2 3 4 5 6

$Fold3
[1] 1 2 3 4 5 6 7 8 9

createTSfolds2(x, Min = 3, lookback = 3, k = 3)

$Fold1
[1] 1 2 3

$Fold2
[1] 4 5 6

$Fold3
[1] 7 8 9

Here is one approach. It is not entirely robust, as I am not sure about the output you seek when both lookback and k are present. Let me know if this is what you were looking for.

 createTSfolds2 <- function(y, Min = max(frequency(y), 3), lookback = NA, k = NA){
   out = llply(Min:(length(y) - 1), seq)
   if (!is.na(k)) {out = out[seq(1, length(out), k)]}
   if (!is.na(lookback)) {
     out = plyr::llply(out, function(z) z[(length(z) - lookback + 1):length(z)])
   }
   names(out) <- paste("Fold", gsub(" ", "0", format(seq(along = out))), sep = "")
   return(out)
 }

createTSfolds2(x, Min = 3, lookback = NA, k = 3)

$Fold1
[1] 1 2 3

$Fold2
[1] 1 2 3 4 5 6

$Fold3
[1] 1 2 3 4 5 6 7 8 9

createTSfolds2(x, Min = 3, lookback = 3, k = 3)

$Fold1
[1] 1 2 3

$Fold2
[1] 4 5 6

$Fold3
[1] 7 8 9
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文