caret rfe（）错误“ x和y＆quot中应有相同数量的样本数量。

发布于 2025-02-01 09:13:57 字数 2744 浏览 4 评论 0原文

我很难解决错误：“ x和y中应该有相同数量的样本”。我注意到其他人在此网站上发布了有关此错误的信息，但他们的解决方案对我没有起作用。我在此处附上我的数据集的缩写版本。

x_train在这里：

x_train <- structure(list(laterality = c("Left", "Right", "Right", "Right", 
"Left", "Left", "Left", "Left", "Left", "Right"), age = c(66L, 
56L, 69L, 49L, 60L, 70L, 58L, 53L, 59L, 64L), insurance = c("MEDICARE", 
"UNITED", "MEDICARE", "UNITED", "COMMERCIAL", "MEDICARE", "AETNA", 
"AETNA", "OXFORD", "MEDICARE_MANAGED"), employment = c("Retired", 
"FullTime", "Retired", "FullTime", "Disabled", "SelfEmployed", 
"Retired", "FullTime", "FullTime", "Disabled"), sex = c("Female", 
"Male", "Female", "Female", "Female", "Female", "Male", "Male", 
"Female", "Male"), race = c("WhiteorCaucasian", "WhiteorCaucasian", 
"WhiteorCaucasian", "WhiteorCaucasian", "WhiteorCaucasian", "WhiteorCaucasian", 
"Other", "BlackorAfricanAmerican", "WhiteorCaucasian", "WhiteorCaucasian"
), ethnicity = c("NotHispanicorLatino", "NotHispanicorLatino", 
"NotHispanicorLatino", "NotHispanicorLatino", "NotHispanicorLatino", 
"NotHispanicorLatino", "NotHispanicorLatino", "NotHispanicorLatino", 
"NotHispanicorLatino", "NotHispanicorLatino"), bmi = c(22.3, 
33, 34.3, 36, 30, 20, 29.5, 33.4, 26.5, 34.2), PreferredLanguage = c("English", 
"English", "English", "English", "English", "English", "English", 
"English", "English", "English"), married = c("Married", "Married", 
"Married", "Married", "Married", "Married", "Divorced", "Single", 
"Married", "Married"), RadiographSevere = c("No", "No", "No", 
"No", "No", "No", "No", "No", "No", "No"), HxAnxietyDepression = c("No", 
"No", "No", "Yes", "Yes", "Yes", "No", "No", "No", "No"), SurgeryYear = c(2017L, 
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L
), operativetime = c(82L, 79L, 85L, 76L, 84L, 86L, 67L, 75L, 
72L, 100L), HipApproach = c("Anterior", "Posterior", "Posterior", 
"Posterior", "Posterior", "Anterior", "Posterior", "Posterior", 
"Posterior", "Posterior")), row.names = c(NA, -10L), class = c("data.table", 
"data.frame"))

y_train在这里：


y_train <- structure(list(POD1AverageNrsScoreCut = c("[0,5)", "[0,5)", "[0,5)", 
                                          "[0,5)", "[5,10)", "[0,5)", "[0,5)", "[5,10)", "[0,5)", "[0,5)"
)), row.names = c(NA, -10L), class = c("data.table", "data.frame"
))

我正在使用RFE的代码在这里：

library(caret)
control <- rfeControl(functions = rfFuncs, # random forest
                      method = "repeatedcv", # repeated cv
                      repeats = 3, # number of repeats
                      number = 10) # number of folds

result_rfe <- rfe(x = x_train, y = y_train, sizes = c(1:30), rfeControl = control)

原文

I am having difficulties solving the error "there should be the same number of samples in x and y". I notice that others have posted on this site regarding this error, but their solutions have not worked for me. I am attaching an abbreviated version of my dataset here.

x_train is here:

x_train <- structure(list(laterality = c("Left", "Right", "Right", "Right", 
"Left", "Left", "Left", "Left", "Left", "Right"), age = c(66L, 
56L, 69L, 49L, 60L, 70L, 58L, 53L, 59L, 64L), insurance = c("MEDICARE", 
"UNITED", "MEDICARE", "UNITED", "COMMERCIAL", "MEDICARE", "AETNA", 
"AETNA", "OXFORD", "MEDICARE_MANAGED"), employment = c("Retired", 
"FullTime", "Retired", "FullTime", "Disabled", "SelfEmployed", 
"Retired", "FullTime", "FullTime", "Disabled"), sex = c("Female", 
"Male", "Female", "Female", "Female", "Female", "Male", "Male", 
"Female", "Male"), race = c("WhiteorCaucasian", "WhiteorCaucasian", 
"WhiteorCaucasian", "WhiteorCaucasian", "WhiteorCaucasian", "WhiteorCaucasian", 
"Other", "BlackorAfricanAmerican", "WhiteorCaucasian", "WhiteorCaucasian"
), ethnicity = c("NotHispanicorLatino", "NotHispanicorLatino", 
"NotHispanicorLatino", "NotHispanicorLatino", "NotHispanicorLatino", 
"NotHispanicorLatino", "NotHispanicorLatino", "NotHispanicorLatino", 
"NotHispanicorLatino", "NotHispanicorLatino"), bmi = c(22.3, 
33, 34.3, 36, 30, 20, 29.5, 33.4, 26.5, 34.2), PreferredLanguage = c("English", 
"English", "English", "English", "English", "English", "English", 
"English", "English", "English"), married = c("Married", "Married", 
"Married", "Married", "Married", "Married", "Divorced", "Single", 
"Married", "Married"), RadiographSevere = c("No", "No", "No", 
"No", "No", "No", "No", "No", "No", "No"), HxAnxietyDepression = c("No", 
"No", "No", "Yes", "Yes", "Yes", "No", "No", "No", "No"), SurgeryYear = c(2017L, 
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L
), operativetime = c(82L, 79L, 85L, 76L, 84L, 86L, 67L, 75L, 
72L, 100L), HipApproach = c("Anterior", "Posterior", "Posterior", 
"Posterior", "Posterior", "Anterior", "Posterior", "Posterior", 
"Posterior", "Posterior")), row.names = c(NA, -10L), class = c("data.table", 
"data.frame"))

y_train is here:


y_train <- structure(list(POD1AverageNrsScoreCut = c("[0,5)", "[0,5)", "[0,5)", 
                                          "[0,5)", "[5,10)", "[0,5)", "[0,5)", "[5,10)", "[0,5)", "[0,5)"
)), row.names = c(NA, -10L), class = c("data.table", "data.frame"
))

Code I am using for rfe is here:

library(caret)
control <- rfeControl(functions = rfFuncs, # random forest
                      method = "repeatedcv", # repeated cv
                      repeats = 3, # number of repeats
                      number = 10) # number of folds

result_rfe <- rfe(x = x_train, y = y_train, sizes = c(1:30), rfeControl = control)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

杀手六號 2025-02-08 09:13:57

我看到您的输出是两个类别的限制间隔。也许如果您将它们作为因素y = as.factor（linrist（y_train））？它对我的

control <- rfeControl(functions = rfFuncs, # random forest
                      method = "repeatedcv", # repeated cv
                      repeats = 3, # number of repeats
                      number = 10) # number of folds

result_rfe <- rfe(x = x_train, y = as.factor(unlist(y_train)), sizes = c(1:30), rfeControl = control)

输出有用：

>result_rfe
    
    Recursive feature selection

Outer resampling method: Cross-Validated (10 fold, repeated 3 times) 

Resampling performance over subset size:

 Variables Accuracy Kappa AccuracySD KappaSD Selected
         1  0.06667     0     0.2537       0         
         2  0.06667     0     0.2537       0         
         3  0.30000     0     0.4661       0         
         4  0.20000     0     0.4068       0         
         5  0.36667     0     0.4901       0         
         6  0.40000     0     0.4983       0         
         7  0.43333     0     0.5040       0         
         8  0.53333     0     0.5074       0        *
         9  0.30000     0     0.4661       0         
        10  0.33333     0     0.4795       0         
        11  0.20000     0     0.4068       0         
        12  0.26667     0     0.4498       0         
        13  0.06667     0     0.2537       0         
        14  0.13333     0     0.3457       0         
        15  0.20000     0     0.4068       0         

The top 5 variables (out of 8):
   insurance, laterality, HipApproach, employment, ethnicity

注意：我不知道这是否是您所期望的，我不知道数据上下文和您的方法。

原始答案：
subscript caret rfe函数中的错误

I see your output is two classes of limit intervals. Maybe if you try them as factors y = as.factor(unlist(y_train))? It worked for me

control <- rfeControl(functions = rfFuncs, # random forest
                      method = "repeatedcv", # repeated cv
                      repeats = 3, # number of repeats
                      number = 10) # number of folds

result_rfe <- rfe(x = x_train, y = as.factor(unlist(y_train)), sizes = c(1:30), rfeControl = control)

Output:

>result_rfe
    
    Recursive feature selection

Outer resampling method: Cross-Validated (10 fold, repeated 3 times) 

Resampling performance over subset size:

 Variables Accuracy Kappa AccuracySD KappaSD Selected
         1  0.06667     0     0.2537       0         
         2  0.06667     0     0.2537       0         
         3  0.30000     0     0.4661       0         
         4  0.20000     0     0.4068       0         
         5  0.36667     0     0.4901       0         
         6  0.40000     0     0.4983       0         
         7  0.43333     0     0.5040       0         
         8  0.53333     0     0.5074       0        *
         9  0.30000     0     0.4661       0         
        10  0.33333     0     0.4795       0         
        11  0.20000     0     0.4068       0         
        12  0.26667     0     0.4498       0         
        13  0.06667     0     0.2537       0         
        14  0.13333     0     0.3457       0         
        15  0.20000     0     0.4068       0         

The top 5 variables (out of 8):
   insurance, laterality, HipApproach, employment, ethnicity

Note: I don't know if this is what you expected, I don't know the data context and your approach.

Original answer:
Subscript out of bounds error in caret's rfe function

回复收藏 0 原文

~没有更多了~