R：从n个子集制作校准模型，并用它们来预测不同的测试集

发布于 2024-11-29 23:10:08 字数 4093 浏览 5 评论 0原文

我正在尝试应用我编写的函数，该函数使用“pls”包来制作模型，然后使用它预测多个测试集（本例中为 9 个），返回每个测试集的 R2、RMSEP 和预测偏差对于从数据框中选择的 n 个子集。该函数正在

cpo<-function(data,newdata1,newdata2,newdata3,newdata4,newdata5,newdata6,newdata7,newdata8,newdata9){
              data.pls<-plsr(protein~.,8,data=data,validation="LOO")#making a pls model
              newdata1.pred<-predict(data.pls,8,newdata=newdata1)   #using the model to predict test sets
              newdata2.pred<-predict(data.pls,8,newdata=newdata2)
              newdata3.pred<-predict(data.pls,8,newdata=newdata3)
              newdata4.pred<-predict(data.pls,8,newdata=newdata4)
              newdata5.pred<-predict(data.pls,8,newdata=newdata5)
              newdata6.pred<-predict(data.pls,8,newdata=newdata6)
              newdata7.pred<-predict(data.pls,8,newdata=newdata7)
              newdata8.pred<-predict(data.pls,8,newdata=newdata8)
              newdata9.pred<-predict(data.pls,8,newdata=newdata9)
              pred.bias1<-mean(newdata1.pred-newdata1[742])         #calculating the prediction bias
              pred.bias2<-mean(newdata2.pred-newdata2[742])
              pred.bias3<-mean(newdata3.pred-newdata3[742])        #[742] reference values in column742
              pred.bias4<-mean(newdata4.pred-newdata4[742])
              pred.bias5<-mean(newdata5.pred-newdata5[742])
              pred.bias6<-mean(newdata6.pred-newdata6[742])
              pred.bias7<-mean(newdata7.pred-newdata7[742])
              pred.bias8<-mean(newdata8.pred-newdata8[742])
              pred.bias9<-mean(newdata9.pred-newdata9[742])
            r<-c(R2(data.pls,"train"),RMSEP(data.pls,"train"),pred.bias1,
                 pred.bias2,pred.bias3,pred.bias4,pred.bias5,pred.bias6,
                 pred.bias7,pred.bias8,pred.bias9)
          return(r)
}

选择 n 个子集（基于我的问题 [1] 的答案：通过对所有子集采用不同的行间隔和appy函数来选择多个子集并将 cpo 函数应用于我尝试的每个子集，

根据 @Gavin 建议进行编辑，

FO03 <- function(data, nSubsets, nSkip){
  outList <- vector("list", 11)
  names(outList) <- c("R2train","RMSEPtrain", paste("bias", 1:9, sep = ""))
  sub <- vector("list", length = nSubsets)  # sub is the n number subsets created by selecting rows
  names(sub) <- c( paste("sub", 1:nSubsets, sep = ""))

 totRow <- nrow(data)

  for (i in seq_len(nSubsets)) {
    rowsToGrab <- seq(i, totRow, nSkip)
      sub[[i]] <- data[rowsToGrab ,] 
  }                                                           


for(i in sub) {                                         #for every subset in sub i want to apply cpo
    outList[[i]] <- cpo(data=sub,newdata1=gag11p,newdata2=gag12p,newdata3=gag13p,  
       newdata4=gag21p,newdata5=gag22p,newdata6=gag23p,                   
       newdata7=gag31p,newdata8=gag32p,newdata9=gag33p) #new data are test sets loaded in the workspace
      }
    return(outlist)
 }

FOO3(GAGp,10,10)

当我尝试此操作时，我不断收到“eval(expr, envir, enclos) 中的错误：未找到对象“蛋白质””。 Protein用于cpo的plsr公式中，并且在数据集中。然后我尝试直接使用 plsr 函数，如下所示

FOO4 <- function(data, nSubsets, nSkip){
outList <- vector("list", 11)
  names(outList) <- c("R2train","RMSEPtrain", paste("bias", 1:9, sep = ""))
  sub <- vector("list", length = nSubsets)
  names(sub) <- c( paste("sub", 1:nSubsets, sep = ""))

  totRow <- nrow(data)

  for (i in seq_len(nSubsets)) {
    rowsToGrab <- seq(i, totRow, nSkip)
      sub[[i]] <- data[rowsToGrab ,] 
  }

  cal<-vector("list", length=nSubsets)  #for each subset in sub make a pls model for protein
  names(cal)<-c(paste("cal",1:nSubsets, sep=""))
  for(i in sub) {
       cal[[i]] <- plsr(protein~.,8,data=sub,validation="LOO")
       }
    return(outlist) # return is just used to end script and check if error still occurs
 }
FOO4(gagpm,10,10)

当我尝试此操作时，我收到相同的错误“eval(expr, envir, enclos) 中的错误：未找到对象“蛋白质””。任何有关如何处理此问题并使该功能正常工作的建议将不胜感激。

原文

I am trying to apply a function I wrote that uses the 'pls' package to make a model and then use it
to predict several test set(in this case 9), returning the R2,RMSEP and prediction bias of each test set
for n number of subset selected from the data frame.
the function is

cpo<-function(data,newdata1,newdata2,newdata3,newdata4,newdata5,newdata6,newdata7,newdata8,newdata9){
              data.pls<-plsr(protein~.,8,data=data,validation="LOO")#making a pls model
              newdata1.pred<-predict(data.pls,8,newdata=newdata1)   #using the model to predict test sets
              newdata2.pred<-predict(data.pls,8,newdata=newdata2)
              newdata3.pred<-predict(data.pls,8,newdata=newdata3)
              newdata4.pred<-predict(data.pls,8,newdata=newdata4)
              newdata5.pred<-predict(data.pls,8,newdata=newdata5)
              newdata6.pred<-predict(data.pls,8,newdata=newdata6)
              newdata7.pred<-predict(data.pls,8,newdata=newdata7)
              newdata8.pred<-predict(data.pls,8,newdata=newdata8)
              newdata9.pred<-predict(data.pls,8,newdata=newdata9)
              pred.bias1<-mean(newdata1.pred-newdata1[742])         #calculating the prediction bias
              pred.bias2<-mean(newdata2.pred-newdata2[742])
              pred.bias3<-mean(newdata3.pred-newdata3[742])        #[742] reference values in column742
              pred.bias4<-mean(newdata4.pred-newdata4[742])
              pred.bias5<-mean(newdata5.pred-newdata5[742])
              pred.bias6<-mean(newdata6.pred-newdata6[742])
              pred.bias7<-mean(newdata7.pred-newdata7[742])
              pred.bias8<-mean(newdata8.pred-newdata8[742])
              pred.bias9<-mean(newdata9.pred-newdata9[742])
            r<-c(R2(data.pls,"train"),RMSEP(data.pls,"train"),pred.bias1,
                 pred.bias2,pred.bias3,pred.bias4,pred.bias5,pred.bias6,
                 pred.bias7,pred.bias8,pred.bias9)
          return(r)
}

selecting n number of subsets (based on an answer from my question[1]: Select several subsets by taking different row interval and appy function to all subsets
and applying cpo function to each subset I tried

Edited based on @Gavin advice

FO03 <- function(data, nSubsets, nSkip){
  outList <- vector("list", 11)
  names(outList) <- c("R2train","RMSEPtrain", paste("bias", 1:9, sep = ""))
  sub <- vector("list", length = nSubsets)  # sub is the n number subsets created by selecting rows
  names(sub) <- c( paste("sub", 1:nSubsets, sep = ""))

 totRow <- nrow(data)

  for (i in seq_len(nSubsets)) {
    rowsToGrab <- seq(i, totRow, nSkip)
      sub[[i]] <- data[rowsToGrab ,] 
  }                                                           


for(i in sub) {                                         #for every subset in sub i want to apply cpo
    outList[[i]] <- cpo(data=sub,newdata1=gag11p,newdata2=gag12p,newdata3=gag13p,  
       newdata4=gag21p,newdata5=gag22p,newdata6=gag23p,                   
       newdata7=gag31p,newdata8=gag32p,newdata9=gag33p) #new data are test sets loaded in the workspace
      }
    return(outlist)
 }

FOO3(GAGp,10,10)

when I try this I keep getting 'Error in eval(expr, envir, enclos) : object 'protein' not found' not found.
Protein is used in the plsr formula of cpo, and is in the data set.
I then tried to use the plsr function directly as seen below

FOO4 <- function(data, nSubsets, nSkip){
outList <- vector("list", 11)
  names(outList) <- c("R2train","RMSEPtrain", paste("bias", 1:9, sep = ""))
  sub <- vector("list", length = nSubsets)
  names(sub) <- c( paste("sub", 1:nSubsets, sep = ""))

  totRow <- nrow(data)

  for (i in seq_len(nSubsets)) {
    rowsToGrab <- seq(i, totRow, nSkip)
      sub[[i]] <- data[rowsToGrab ,] 
  }

  cal<-vector("list", length=nSubsets)  #for each subset in sub make a pls model for protein
  names(cal)<-c(paste("cal",1:nSubsets, sep=""))
  for(i in sub) {
       cal[[i]] <- plsr(protein~.,8,data=sub,validation="LOO")
       }
    return(outlist) # return is just used to end script and check if error still occurs
 }
FOO4(gagpm,10,10)

When I tried this I get the same error 'Error in eval(expr, envir, enclos) : object 'protein' not found'.
Any advice on how to deal with this and make the function work will be much appreciated.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

情释 2024-12-06 23:10:08

我怀疑问题就在 FOO3() 的开头：

FOO3 <- function(data, nSubsets, nSkip) {
 outList <- vector("list", r <- c(R2(data.pls,"train"), RMSEP(data.pls,"train"), 
                   pred.bias1, pred.bias2, pred.bias3, pred.bias4, pred.bias5,
                   pred.bias6, pred.bias7, pred.bias8, pred.bias9))

不确定在创建 outList 时您想要做什么，但是 vector() 有两个参数，您似乎正在为 r 分配一个数字向量，您希望 R 将其用作 length 参数>向量()。

在这里，您使用的是 data.pls 对象，而该对象尚不存在 - 并且在 FOO3() 框架中永远不会存在 - 它仅在 中创建>cpo()。

您的第二个循环看起来完全错误 - 您没有将 cpo() 的输出分配给任何内容。我怀疑你想要：

outList <- vector("list", 11)
names(outList) <- c("R2train","RMSEPtrain", paste("bias", 1:9, sep = ""))
....
for(i in subset) {
    outList[[i]] <- cpo(....)
}
return(outList)

但这取决于 subset 是什么等。你也没有正确理解这个循环的语法。参数

for(i in(subset)) {

。

for(i in subset) {

subset 和 data 并不是什么好名字，因为它们是常见的 R 函数和建模

你的代码有很多问题。尝试从简单的开始并从那里开始构建。

I suspect the problem is immediately at the start of FOO3():

FOO3 <- function(data, nSubsets, nSkip) {
 outList <- vector("list", r <- c(R2(data.pls,"train"), RMSEP(data.pls,"train"), 
                   pred.bias1, pred.bias2, pred.bias3, pred.bias4, pred.bias5,
                   pred.bias6, pred.bias7, pred.bias8, pred.bias9))

Not sure what you are trying to do when creating outList, but vector() has two arguments and you seem to be assigning to r a vector of numerics that you want R to use as the length argument to vector().

Here you are using the object data.pls and this doesn't exist yet - and never will in the frame of FOO3() - it is only ever created in cpo().

Your second loop looks totally wrong - you are not assigning the output from cpo() to anything. I suspect you wanted:

outList <- vector("list", 11)
names(outList) <- c("R2train","RMSEPtrain", paste("bias", 1:9, sep = ""))
....
for(i in subset) {
    outList[[i]] <- cpo(....)
}
return(outList)

But that depends on what subset is etc. You also haven't got the syntax for this loop right. You have

for(i in(subset)) {

when it should be

for(i in subset) {

And subset and data aren't great names as these are common R functions and modelling arguments.

There are lots of problems with your code. Try to start simple and build up from there.

回复收藏 0 原文

擦肩而过的背影 2024-12-06 23:10:08

我已经成功地使用这个实现了我想要的目标，如果有更好的方法来做到这一点（我确信一定有）我渴望学习。这个函数执行以下任务
1. 从数据框中选择“n”个子集
2.对于创建的每个子集，都会制作一个plsr模型
3.每个plsr模型用于预测9个测试集
4.对于每个预测，计算预测偏差

far5<- function(data, nSubsets, nSkip){
   sub <- vector("list", length = nSubsets)
   names(sub) <- c( paste("sub", 1:nSubsets, sep = ""))                   
   totRow <- nrow(data)
   for (i in seq_len(nSubsets)) {
     rowsToGrab <- seq(i, totRow, nSkip)
       sub[[i]] <- data[rowsToGrab ,]}       #sub is the subsets created
  mop<- lapply(sub,cpr2)                     #assigning output from cpr to mop
   names(mop)<-c(paste("mop", mop, sep="")) 
  return(names(mop))
 }
call:  far5(data,nSubsets, nSkip))

第一部分 - 选择子集基于我的问题的答案通过对所有子集采用不同的行间隔和appy函数来选择多个子集
然后，我能够将函数 cpr2 应用于使用“lapply”创建的子集，而不是像之前那样使用“for”循环。
cpr2是cpo的修改版，仅提供数据，并且在函数中直接使用要预测的新数据，如下所示。

cpr2<-function(data){ 
  data.pls<-plsr(protein~.,8,data=data,validation="LOO") #make plsr model       
  gag11p.pred<-predict(data.pls,8,newdata=gag11p)  #predict each test set 
  gag12p.pred<-predict(data.pls,8,newdata=gag12p)
  gag13p.pred<-predict(data.pls,8,newdata=gag13p)
  gag21p.pred<-predict(data.pls,8,newdata=gag21p)
  gag22p.pred<-predict(data.pls,8,newdata=gag22p)            
  gag23p.pred<-predict(data.pls,8,newdata=gag23p)
  gag31p.pred<-predict(data.pls,8,newdata=gag31p)
  gag32p.pred<-predict(data.pls,8,newdata=gag32p)
  gag33p.pred<-predict(data.pls,8,newdata=gag33p)                        
  pred.bias1<-mean(gag11p.pred-gag11p[742])     #calculate prediction bias      
  pred.bias2<-mean(gag12p.pred-gag12p[742])
  pred.bias3<-mean(gag13p.pred-gag13p[742])         
  pred.bias4<-mean(gag21p.pred-gag21p[742])
  pred.bias5<-mean(gag22p.pred-gag22p[742])
  pred.bias6<-mean(gag23p.pred-gag23p[742])
  pred.bias7<-mean(gag31p.pred-gag31p[742])
  pred.bias8<-mean(gag32p.pred-gag32p[742])
  pred.bias9<-mean(gag33p.pred-gag33p[742])            
r<-signif(c(pred.bias1,pred.bias2,pred.bias3,pred.bias4,pred.bias5,
      pred.bias6,pred.bias7,pred.bias8,pred.bias9),2)            
  out<-c(R2(data.pls,"train",ncomp=8),RMSEP(data.pls,"train",ncomp=8),r)
 return(out)          
}                 #signif use to return 2 decimal place for prediction bias

call:cpr2(data)

我能够用它来解决我的问题，但是由于要预测的新数据量只有九个，所以可以像我一样列出它们。如果有更通用的方法来做到这一点，我有兴趣学习。

I have managed to achieved what i wanted using this, if there is a better way of doing it (i'm sure there must be) I'm eager to learn.This function preforms the following task
1. select "n" number of subsets from a dataframe
2. For each subset created, a plsr model is made
3. Each plsr model is used to predict 9 test sets
4. For each prediction, the prediction bias is calculated

far5<- function(data, nSubsets, nSkip){
   sub <- vector("list", length = nSubsets)
   names(sub) <- c( paste("sub", 1:nSubsets, sep = ""))                   
   totRow <- nrow(data)
   for (i in seq_len(nSubsets)) {
     rowsToGrab <- seq(i, totRow, nSkip)
       sub[[i]] <- data[rowsToGrab ,]}       #sub is the subsets created
  mop<- lapply(sub,cpr2)                     #assigning output from cpr to mop
   names(mop)<-c(paste("mop", mop, sep="")) 
  return(names(mop))
 }
call:  far5(data,nSubsets, nSkip))

The first part -selecting the subsets is based on the answer to my question Select several subsets by taking different row interval and appy function to all subsets
I was then able to apply the function cpr2 to the subsets created using "lapply" instead of the "for' loop as was previously done.
cpr2 is a modification of cpo, for which only data is supplied, and the new data to be predicted is used directly in the function as shown below.

cpr2<-function(data){ 
  data.pls<-plsr(protein~.,8,data=data,validation="LOO") #make plsr model       
  gag11p.pred<-predict(data.pls,8,newdata=gag11p)  #predict each test set 
  gag12p.pred<-predict(data.pls,8,newdata=gag12p)
  gag13p.pred<-predict(data.pls,8,newdata=gag13p)
  gag21p.pred<-predict(data.pls,8,newdata=gag21p)
  gag22p.pred<-predict(data.pls,8,newdata=gag22p)            
  gag23p.pred<-predict(data.pls,8,newdata=gag23p)
  gag31p.pred<-predict(data.pls,8,newdata=gag31p)
  gag32p.pred<-predict(data.pls,8,newdata=gag32p)
  gag33p.pred<-predict(data.pls,8,newdata=gag33p)                        
  pred.bias1<-mean(gag11p.pred-gag11p[742])     #calculate prediction bias      
  pred.bias2<-mean(gag12p.pred-gag12p[742])
  pred.bias3<-mean(gag13p.pred-gag13p[742])         
  pred.bias4<-mean(gag21p.pred-gag21p[742])
  pred.bias5<-mean(gag22p.pred-gag22p[742])
  pred.bias6<-mean(gag23p.pred-gag23p[742])
  pred.bias7<-mean(gag31p.pred-gag31p[742])
  pred.bias8<-mean(gag32p.pred-gag32p[742])
  pred.bias9<-mean(gag33p.pred-gag33p[742])            
r<-signif(c(pred.bias1,pred.bias2,pred.bias3,pred.bias4,pred.bias5,
      pred.bias6,pred.bias7,pred.bias8,pred.bias9),2)            
  out<-c(R2(data.pls,"train",ncomp=8),RMSEP(data.pls,"train",ncomp=8),r)
 return(out)          
}                 #signif use to return 2 decimal place for prediction bias

call:cpr2(data)

I was able to use this to solve my problem, however since the amount of new data to be predicted was only nine, it was possible to list them out as i did. If there is a more generalized way to do this I'm interested in learning.

回复收藏 0 原文

~没有更多了~