与CARET的线性回归

发布于 2025-01-31 14:33:15 字数 2459 浏览 2 评论 0原文

我真的可以使用您的帮助。我正在尝试编写一个R脚本，该脚本使用一些数据并使用caret软件包执行glm。这是我的代码：

set.seed(4000)
# Create training and test data with 80%-20% ratio
new_values$gender <- as.factor(new_values$gender)
trainingRows= createDataPartition(new_values$gender, p= .8, list= FALSE, times= 1)
training_data_set= new_values[trainingRows,]
test_data_set= new_values[-trainingRows,]
# Test training with 10 times cross-validation
fitness_control <- trainControl(method = "cv", number = 10, savePredictions = TRUE)
# Train model with linear regression method (it takes about 5-10 minutes waiting time)
linear_regression <-train(gender~ ., data=training_data_set,method="glm",family=binomial(), trControl=fitness_control)
linear_regression

这是数据表：

noreflow noreferrer”我收到此错误消息：

出了点问题；所有准确度量值都缺少：

    Accuracy       Kappa    
 Min.   : NA   Min.   : NA  
 1st Qu.: NA   1st Qu.: NA  
 Median : NA   Median : NA  
 Mean   :NaN   Mean   :NaN  
 3rd Qu.: NA   3rd Qu.: NA  
 Max.   : NA   Max.   : NA  
 NA's   :1     NA's   :1    
Error: Stopping
In addition: There were 11 warnings (use warnings() to see them)

警告消息是：

警告消息： 1：模型拟合失败for for for for for for for for for for for for for for for for for：parameter = none错误：protect（）：protection stack溢出

2：for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for：procect（）：protection stack Overflow

3：for for for fil none Error : protect(): protection stack overflow

4: model fit failed for Fold04: parameter=none Error : protect(): protection stack overflow

5: model fit failed for Fold05: parameter=none Error : protect(): protection stack overflow

6: model fit failed for Fold06: parameter=none Error : protect(): protection stack overflow

7: model fit failed for Fold07: parameter=none Error : protect(): protection stack overflow

8: model fit failed for Fold08: parameter= none Error : protect(): protection stack overflow

9: model fit failed for Fold09: parameter=none Error : protect(): protection stack overflow

10: model fit failed for Fold10: parameter=none Error : protect(): protection stack overflow

11：在nominalTrainWorkflow（x = x，y = y，wts = strige，info = traininfo，...：：：重新采样的绩效指标中缺少值。

你能帮忙吗？

原文

I could really use your help. I am trying to write an R script that takes some data and performs glm using the caret package. Here is my code:

set.seed(4000)
# Create training and test data with 80%-20% ratio
new_values$gender <- as.factor(new_values$gender)
trainingRows= createDataPartition(new_values$gender, p= .8, list= FALSE, times= 1)
training_data_set= new_values[trainingRows,]
test_data_set= new_values[-trainingRows,]
# Test training with 10 times cross-validation
fitness_control <- trainControl(method = "cv", number = 10, savePredictions = TRUE)
# Train model with linear regression method (it takes about 5-10 minutes waiting time)
linear_regression <-train(gender~ ., data=training_data_set,method="glm",family=binomial(), trControl=fitness_control)
linear_regression

Here is the data table:
new_data table

When I try to run this script R takes really long time to load and after that I get this error message:

Something is wrong; all the Accuracy metric values are missing:

    Accuracy       Kappa    
 Min.   : NA   Min.   : NA  
 1st Qu.: NA   1st Qu.: NA  
 Median : NA   Median : NA  
 Mean   :NaN   Mean   :NaN  
 3rd Qu.: NA   3rd Qu.: NA  
 Max.   : NA   Max.   : NA  
 NA's   :1     NA's   :1    
Error: Stopping
In addition: There were 11 warnings (use warnings() to see them)

The warning messages are:

Warning messages:
1: model fit failed for Fold01: parameter=none Error : protect(): protection stack overflow

2: model fit failed for Fold02: parameter=none Error : protect(): protection stack overflow

3: model fit failed for Fold03: parameter=none Error : protect(): protection stack overflow

4: model fit failed for Fold04: parameter=none Error : protect(): protection stack overflow

5: model fit failed for Fold05: parameter=none Error : protect(): protection stack overflow

6: model fit failed for Fold06: parameter=none Error : protect(): protection stack overflow

7: model fit failed for Fold07: parameter=none Error : protect(): protection stack overflow

8: model fit failed for Fold08: parameter=none Error : protect(): protection stack overflow

9: model fit failed for Fold09: parameter=none Error : protect(): protection stack overflow

10: model fit failed for Fold10: parameter=none Error : protect(): protection stack overflow

11: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, ... :
There were missing values in resampled performance measures.

Can you please help?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

甜｀诱少女 2025-02-07 14:33:15

与glmnet一起拟合似乎可以正常工作，尽管我没有看到答案是否真的有意义！我不得不弄清一些数据问题，这可能是您的方式……

library(readxl)
library(caret)
library(glmnet)
library(dplyr)
dd <- (read_excel("thema3_results1.xlsx")
    |> select(-1)  ## drop row names
    |> mutate(across(gender, factor))
    |> mutate(across(-gender, as.numeric))  ## convert character to numeric!
)

set.seed(4000)

trainingRows <- createDataPartition(dd$gender, p= .8, list= FALSE, times= 1)
training_data_set <-  dd[trainingRows,]
test_data_set <- dd[-trainingRows,]
# Test training with 10 times cross-validation
fitness_control <- trainControl(method = "cv", number = 10, savePredictions = TRUE)
system.time(logistic_reg <- train(gender~ ., 
                          data=training_data_set,
                          method="glmnet",
                          family="binomial", ## not binomial() for glmnet ...
                          trControl=fitness_control))

训练步骤在我的机器上花费了大约2秒钟，

这似乎已经变得准确== 1，这可能意味着它仍然过于拟合。 .. ????

Fitting with glmnet seems to work OK, although I haven't looked to see if the answers actually make sense! I had to sort out some data issues, which might have been what was getting in your way ...

library(readxl)
library(caret)
library(glmnet)
library(dplyr)
dd <- (read_excel("thema3_results1.xlsx")
    |> select(-1)  ## drop row names
    |> mutate(across(gender, factor))
    |> mutate(across(-gender, as.numeric))  ## convert character to numeric!
)

set.seed(4000)

trainingRows <- createDataPartition(dd$gender, p= .8, list= FALSE, times= 1)
training_data_set <-  dd[trainingRows,]
test_data_set <- dd[-trainingRows,]
# Test training with 10 times cross-validation
fitness_control <- trainControl(method = "cv", number = 10, savePredictions = TRUE)
system.time(logistic_reg <- train(gender~ ., 
                          data=training_data_set,
                          method="glmnet",
                          family="binomial", ## not binomial() for glmnet ...
                          trControl=fitness_control))

The training step took about 2 seconds on my machine,

This seems to be getting accuracy == 1, which probably means it's still overfitting ... ???

回复收藏 0 原文

~没有更多了~