R:如何循环循环一次丢弃一个观察结果?

发布于 2025-02-03 21:32:47 字数 1542 浏览 4 评论 0原文

通过回归模型,我每次都会删除一个观察结果,以估计有影响力的观察的效果。

我想多次运行该模型,每次丢弃观察结果并提取相关系数估算并将其存储在向量中。我认为这很容易通过相当直截了当的循环来完成,但是,我陷入了细节。

我想留下一个载体,其中包含来自同一模型的N迭代的N系数估计值。任何帮助都是有益的!

下面我提供了一些虚拟数据和示例代码。

#Dummy data:

set.seed(489)

patientn <- rep(1:400)

gender <- rbinom(400, 1, 0.5)

productid <- rep(c("Product A","Product B"), times=200)

country <- rep(c("USA","UK","Canada","Mexico"), each=50)

baselarea <- rnorm(400,400,60) #baseline area
baselarea2 <- rnorm(400,400,65) #baseline area2

sfactor  <- c(
  rep(c(0.3,0.9), times = 25),
  rep(c(0.4,0.5), times = 25),
  rep(c(0.2,0.4), times = 25),
  rep(c(0.3,0.7), times = 25)
)

rashdummy2a <- data.frame(patientn,gender,productid,country,baselarea,baselarea2,sfactor)

Data <- rashdummy2a %>% mutate(rashleft = baselarea2*sfactor/baselarea*100) ```


## Example of how this can be done manually: 

# model
m1<-lm(rashleft ~ gender + baselarea + sfactor, data = data)

# extracting relevant coefficient estimates, each time dropping a different "patient" ("patientn")

betas <- c(lm(rashleft ~ gender + baselarea + sfactor, data = rashdummy2b, patientn !=1)$coefficients[2],
           lm(rashleft ~ gender + baselarea + sfactor, data = rashdummy2b, patientn !=2)$coefficients[2],
           lm(rashleft ~ gender + baselarea + sfactor, data = rashdummy2b, patientn !=3)$coefficients[2])

# the betas vector now stores the relevant coefficient estimates (coefficient nr 2, for gender) for three different variations of the model.  

I have trouble looping through a regression model dropping one observation each time to estimate the effect of influential observations.

I would like to run the model several times, each time dropping the ith observation and extracting the relevant coefficient estimate and store it in a vector. I think this could quite easily be done with a fairly straight forward loop, however, I'm stuck at the specifics.

I want to be left with a vector containing n coefficient estimates from n iterations of the same model. Any help would be beneficial!

Below I provide some dummy data and example code.

#Dummy data:

set.seed(489)

patientn <- rep(1:400)

gender <- rbinom(400, 1, 0.5)

productid <- rep(c("Product A","Product B"), times=200)

country <- rep(c("USA","UK","Canada","Mexico"), each=50)

baselarea <- rnorm(400,400,60) #baseline area
baselarea2 <- rnorm(400,400,65) #baseline area2

sfactor  <- c(
  rep(c(0.3,0.9), times = 25),
  rep(c(0.4,0.5), times = 25),
  rep(c(0.2,0.4), times = 25),
  rep(c(0.3,0.7), times = 25)
)

rashdummy2a <- data.frame(patientn,gender,productid,country,baselarea,baselarea2,sfactor)

Data <- rashdummy2a %>% mutate(rashleft = baselarea2*sfactor/baselarea*100) ```


## Example of how this can be done manually: 

# model
m1<-lm(rashleft ~ gender + baselarea + sfactor, data = data)

# extracting relevant coefficient estimates, each time dropping a different "patient" ("patientn")

betas <- c(lm(rashleft ~ gender + baselarea + sfactor, data = rashdummy2b, patientn !=1)$coefficients[2],
           lm(rashleft ~ gender + baselarea + sfactor, data = rashdummy2b, patientn !=2)$coefficients[2],
           lm(rashleft ~ gender + baselarea + sfactor, data = rashdummy2b, patientn !=3)$coefficients[2])

# the betas vector now stores the relevant coefficient estimates (coefficient nr 2, for gender) for three different variations of the model.  

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

戈亓 2025-02-10 21:32:47

我们可以使用循环。在您的问题中,您使用未定义的对象rashdummy2b。现在,我使用了数据,但是您可以通过选择对象替换。

#create list to bind results to
result <- list()

#loop through patients and extract betas
for(i in unique(data$patientn)){

  #construct linear model
  lm.model <- lm(rashleft ~ gender + baselarea + sfactor, data = subset(data, data$patientn != i))
  
  #create data.frame containing patient left out and coefficient
  result.dt <- data.frame(beta = lm.model$coefficients[[2]],
                          patient_left_out = i)
  
  #bind to list
  result[[i]] <- result.dt
}

#bind to data.frame
result <- do.call(rbind, result)

结果

head(result) 
      beta patient_left_out
1 1.381248                1
2 1.345188                2
3 1.427784                3
4 1.361674                4
5 1.420417                5
6 1.454196                6

We can use a for loop. In your question you use an object rashdummy2b which is not defined. Now I used data but you can replace that by an object of choice.

#create list to bind results to
result <- list()

#loop through patients and extract betas
for(i in unique(data$patientn)){

  #construct linear model
  lm.model <- lm(rashleft ~ gender + baselarea + sfactor, data = subset(data, data$patientn != i))
  
  #create data.frame containing patient left out and coefficient
  result.dt <- data.frame(beta = lm.model$coefficients[[2]],
                          patient_left_out = i)
  
  #bind to list
  result[[i]] <- result.dt
}

#bind to data.frame
result <- do.call(rbind, result)

Result

head(result) 
      beta patient_left_out
1 1.381248                1
2 1.345188                2
3 1.427784                3
4 1.361674                4
5 1.420417                5
6 1.454196                6
坏尐絯℡ 2025-02-10 21:32:47

您可以使用否定索引删除特定的行(或列)。在您的情况下,您进行以下操作:

betas <- numeric(nrow(rashdummy2b))  # memory preallocation
for (i in 1:nrow(rashdummy2b)) {
  betas[i] <- lm(rashleft ~ gender + baselarea + sfactor, data=rashdummy2b[-i,])$coefficients[2]
}

You can drop a particular row (or column) by using a negative index. In your case, you proceed as follows:

betas <- numeric(nrow(rashdummy2b))  # memory preallocation
for (i in 1:nrow(rashdummy2b)) {
  betas[i] <- lm(rashleft ~ gender + baselarea + sfactor, data=rashdummy2b[-i,])$coefficients[2]
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文