“对象中的缺少值”&quot"使用Caret ::火车功能

发布于 2025-02-02 16:52:36 字数 2324 浏览 5 评论 0原文

当我试图在{caret}软件包中使用火车功能以进行回归模型的100倍CV时,我遇到了这个错误。我执行的代码如下:

#read the dataset and convert columns to factors
data<-read.csv("synchronic_dataset_full.csv")
data<-as.data.frame(unclass(data), stringsAsFactors = TRUE)

#cross-validation using train() in {caret}
set.seed(527)
inTraining <- createDataPartition(data$realization, p = .75, list = FALSE)
training <- data [ inTraining,]
testing  <- data [-inTraining,]

fitControl <- trainControl(method = "cv",
                           number = 100)

regression_fit <- train(realization ~ (1|verb/VerbSense) + 
                                      (1|Corpus) + 
                                      Variety + 
                                      Register +
                                      FollowVerb +      
                                      z.WeightRatio + 
                                      ThemeConcreteness +
                                      PrimeTypeCoarse +
                                      RecPron +
                                      z.RecThematicity +
                                      ThemeDef +
                                      z.RecHeadFrequency +
                                      RecHumaness +
                                      RecComplexity +
                                      ThemeComplexity +
                                      z.TTR +
                                      Variety*
                                      (RecComplexity +
                                      RecPron) +
                                      Register *
                                      ThemeConcreteness, 
                                      data = training, 
                                      method = "glm",
                                      metric = "Accuracy",
                                      trControl = fitControl)
regression_fit

错误说:

na.fail.default中的错误(list(earlization = c(1L,1L,1L,2L,1L,1L,1L,1L,1L,:object

中的丢失值)

我检查了数据集,我敢肯定它不包含缺失/NA值。我也尝试了通过添加na.Action = Na.exclude trcontrol = fitControl的额外行来解决问题,并且可以在<<<。 a href =“ https://osf.io/4pmh3/” rel =“ nofollow noreferrer”> this OSF页面(注意:请在使用后请稍微删除它,因为它包含敏感的未发表&amp; un-peer- unpeer-mp; unpeer--审查信息)。

I ran into this error when I was trying to use the train function in {caret} package to do a 100-fold cv for a regression model. The codes I executed are as follows:

#read the dataset and convert columns to factors
data<-read.csv("synchronic_dataset_full.csv")
data<-as.data.frame(unclass(data), stringsAsFactors = TRUE)

#cross-validation using train() in {caret}
set.seed(527)
inTraining <- createDataPartition(data$realization, p = .75, list = FALSE)
training <- data [ inTraining,]
testing  <- data [-inTraining,]

fitControl <- trainControl(method = "cv",
                           number = 100)

regression_fit <- train(realization ~ (1|verb/VerbSense) + 
                                      (1|Corpus) + 
                                      Variety + 
                                      Register +
                                      FollowVerb +      
                                      z.WeightRatio + 
                                      ThemeConcreteness +
                                      PrimeTypeCoarse +
                                      RecPron +
                                      z.RecThematicity +
                                      ThemeDef +
                                      z.RecHeadFrequency +
                                      RecHumaness +
                                      RecComplexity +
                                      ThemeComplexity +
                                      z.TTR +
                                      Variety*
                                      (RecComplexity +
                                      RecPron) +
                                      Register *
                                      ThemeConcreteness, 
                                      data = training, 
                                      method = "glm",
                                      metric = "Accuracy",
                                      trControl = fitControl)
regression_fit

And the error says:

Error in na.fail.default(list(realization = c(1L, 1L, 2L, 1L, 1L, 1L, : missing values in object

I checked the dataset and I am sure it contains no missing/NA values. I also attempted to solve the problem by adding an extra line of na.action=na.exclude after trControl=FitControl, and it doesn't help. The dataset can be accessed in this OSF page (note: please kindly delete it after using, as it's contains sensitive unpublished & un-peer-reviewed information).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

生生漫 2025-02-09 16:52:36

只需在模型公式中删除(1 | ...)周围的括号。
另一种可能性是|不适用于因素,我猜这是这种情况。

Just remove the parenthesis around (1|...) in the model formula.
Another possibility is that | does not apply to factors, which is the case here I guess.

一枫情书 2025-02-09 16:52:36

我设法解决了问题并使用以下代码获得了相关结果:

set.seed(527)
for (i in 1:100){
  Train <- createDataPartition(data$realization, p=0.75, list=FALSE)
  training <- data[Train, ]
  testing <- data[-Train, ]
  mod_fit <- glmer (realization ~ (1|verb/VerbSense) + 
                                  (1|Corpus) + 
                                  Variety + 
                                  Register +
                                  FollowVerb +      
                                  z.WeightRatio + 
                                  ThemeConcreteness +
                                  PrimeTypeCoarse +
                                  RecPron +
                                  z.RecThematicity +
                                  ThemeDef +
                                  z.RecHeadFrequency +
                                  RecHumaness +
                                  RecComplexity +
                                  ThemeComplexity +
                                  z.TTR +
                                  Variety*
                                  (RecComplexity +
                                  RecPron) +
                                  Register *
                                  ThemeConcreteness, data=training, family="binomial")
  pred = predict(mod_fit, newdata=testing,allow.new.levels = TRUE)
  predictions.cat=ifelse(pred>0.5,"ThemeFirst", "RecipientFirst")
  predictions.cat=as.factor(predictions.cat)
  result=confusionMatrix(data=predictions.cat, testing$realization)
  print(result$overall[1])
}

由于解决了问题,并且考虑到数据集仍在构造中,因此已删除了OSF页面中的材料。

I have managed to fix the issue and obtained relevant results using the following codes:

set.seed(527)
for (i in 1:100){
  Train <- createDataPartition(data$realization, p=0.75, list=FALSE)
  training <- data[Train, ]
  testing <- data[-Train, ]
  mod_fit <- glmer (realization ~ (1|verb/VerbSense) + 
                                  (1|Corpus) + 
                                  Variety + 
                                  Register +
                                  FollowVerb +      
                                  z.WeightRatio + 
                                  ThemeConcreteness +
                                  PrimeTypeCoarse +
                                  RecPron +
                                  z.RecThematicity +
                                  ThemeDef +
                                  z.RecHeadFrequency +
                                  RecHumaness +
                                  RecComplexity +
                                  ThemeComplexity +
                                  z.TTR +
                                  Variety*
                                  (RecComplexity +
                                  RecPron) +
                                  Register *
                                  ThemeConcreteness, data=training, family="binomial")
  pred = predict(mod_fit, newdata=testing,allow.new.levels = TRUE)
  predictions.cat=ifelse(pred>0.5,"ThemeFirst", "RecipientFirst")
  predictions.cat=as.factor(predictions.cat)
  result=confusionMatrix(data=predictions.cat, testing$realization)
  print(result$overall[1])
}

As the problem has been solved, and considering that the dataset is still under construction, the material in the OSF page link has been removed.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文