生成滑动窗口以进行预测任务子集数据
我想编写一个滑动窗口功能,以便使用从T,T+1和T+2年训练的模型来对T+3年的结果进行预测。这意味着,对于10年的数据,所需的滑动窗口函数应创建7个火车测试拆分并进行7个预测(对于t+3,t+4,t+5,t+6,t+6,t+7,t +8,T+9年)。
我想到了以下代码,但结果没有敲响铃铛。所得的对象长度不仅有所不同,而且即使我尝试手动完成预测任务,预测
函数实际上也会为一年的结果生成3个预测值,这是没有意义的。如果有人可以指出错误的来源,这将是很感激的。
# generate the data
set.seed(123)
df <- data.frame(year = 2000:2009, # T = 10
y = c(1, 1, 1, 1, 0, 0, 1, 0, 0, 0),
var1 = runif(10, min=0, max=1),
var2 = runif(10, min=1, max=2))
# store predicted values in a list
pred <- list()
# loop from the 1st year to the T-3 year
for(i in 2000:2007){
df_sub1 <- subset(df, year == c(i, i+1, i+2))
mod <- glm(y~var1+var2, data=df_sub1, family=binomial())
df_sub2 <- subset(df, year == i+3)
pred[[i]] <- predict(mod, data=df_sub2, type = "response")
}
# error message
Error in family$linkfun(mustart) :
Argument mu must be a nonempty numeric vector
In addition: Warning messages:
1: In year == c(i, i + 1, i + 2) :
longer object length is not a multiple of shorter object length
2: In year == c(i, i + 1, i + 2) :
longer object length is not a multiple of shorter object length
I want to write a sliding window function in order to use the model trained from t, t+1, and t+2 year to make prediction on the outcome of the t+3 year. This means that for a 10-year's data, the desired sliding window function should create 7 train-test splits and make 7 predictions (for the t+3, t+4, t+5, t+6, t+7, t+8, t+9 year).
I came up with the following code but the result doesn't ring the bell. Not only does the resulting object length differs, but even if I try to manually work through the prediction task, the predict
function actually generates 3 predicted values for a single year's outcome, which doesn't make sense. It would be grateful if someone could point out the sources of the error.
# generate the data
set.seed(123)
df <- data.frame(year = 2000:2009, # T = 10
y = c(1, 1, 1, 1, 0, 0, 1, 0, 0, 0),
var1 = runif(10, min=0, max=1),
var2 = runif(10, min=1, max=2))
# store predicted values in a list
pred <- list()
# loop from the 1st year to the T-3 year
for(i in 2000:2007){
df_sub1 <- subset(df, year == c(i, i+1, i+2))
mod <- glm(y~var1+var2, data=df_sub1, family=binomial())
df_sub2 <- subset(df, year == i+3)
pred[[i]] <- predict(mod, data=df_sub2, type = "response")
}
# error message
Error in family$linkfun(mustart) :
Argument mu must be a nonempty numeric vector
In addition: Warning messages:
1: In year == c(i, i + 1, i + 2) :
longer object length is not a multiple of shorter object length
2: In year == c(i, i + 1, i + 2) :
longer object length is not a multiple of shorter object length
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
错误/警告来自使用
==
当RHS长度&GT时; 1。在%中使用%
- 输出
The error/warning is from using
==
when the rhs is of length > 1. Use%in%
-output
这是包装
Zoo
的功能之一,将功能应用于滚动窗口的另一种方式。要应用的功能,rol_pred
几乎是 akrun的,只有创建,由
Here is another way with one of package
zoo
's functions to apply a function to a rolling window. The function to be applied,roll_pred
is almost a copy&paste of akrun's, only the creation of the subsets is different.Created on 2022-06-05 by the reprex package (v2.0.1)