如何计算列表中元素重要性的均值?
我正在培训三次随机森林算法,并将变量的重要性保存到列表中(使用插入式包装)。如果存在每个功能,如何计算每个功能的平均值? 例如,如何计算三个总“ ESR”的平均值? (我要训练该算法一千次) 这些是我的示例:
[[1]]
rf variable importance
only 20 most important variables shown (out of 119)
Overall
Albumin 100.00
age 97.36
PR 60.18
RR 42.41
Weight 35.26
SystolicBP 32.14
Cancers1 29.79
ESR 27.66
Neutrophyl 26.98
CPK 25.68
EjectionFraction 25.59
BMI 24.42
Calcium 23.87
WBC 22.36
Urea 22.01
LDH 21.23
FBS 20.21
Ddimer 19.32
HB 18.99
Lymphocyte 18.78
[[2]]
rf variable importance
only 20 most important variables shown (out of 119)
Overall
age 100.00
FBS 57.80
WBC 53.88
PR 53.84
Neutrophyl 53.52
Weight 52.31
HB 51.69
LDH 50.15
Urea 49.31
Albumin 47.05
Lymphocyte 46.87
CPK 46.54
SystolicBP 45.64
Calcium 44.87
ESR 43.54
Ferritin 43.03
CRP 43.00
PLT 42.83
Creatinine 42.53
EjectionFraction 41.43
[[3]]
rf variable importance
only 20 most important variables shown (out of 119)
Overall
age 100.00
Albumin 43.41
Weight 24.88
FBS 24.63
BS 23.31
PR 21.47
LDH 21.06
Neutrophyl 20.68
BMI 17.94
EjectionFraction 17.29
CPK 16.49
WBC 16.11
ALP 15.72
RR 15.28
Lymphocyte 14.94
Cancers1 14.68
CRP 14.50
ESR 14.38
Ddimer 13.05
Ferritin 12.96
我可以创建一个保存功能及其整体的数据框架吗? 感谢您的帮助 这是我的代码:
prediction_value_rf=list()
importance_rf=list()
auc_rf=list()
weight_rf=list()
for ( i in 1:1000){
resample_death <- death[sample(nrow(death), size=300), ]
resample_alive <-alive[sample(nrow(alive), size=300), ]
f_dataset=rbind(resample_alive,resample_death)
inx <- sample.split(seq_len(nrow(f_dataset)), 0.25)
trainData<- f_dataset[!inx, ]
testData <- f_dataset[inx, ]
rf_fit <- train(vital_status ~ .,
data = trainData,
method = "rf",
)
pred=predict(rf_fit, testData[,-109])
pred1=predict(rf_fit, testData[,-109],type='prob')
prediction_value_rf[[i]]=pred1[2]
auc=auc(testData$vital_status,as.numeric(pred1[[2]]),direction="<", levels = levels(testData$vital_status))
auc_rf[[i]]=auc
a=varImp(rf_fit,scale = TRUE)
importance_rf[[i]] <- a
weight_rf[[i]]=max(rf_fit$results$Accuracy)
}
最后,我想计算所有总体功能的平均值(想创建集合模型)。 我的数据集包含109个功能和4200个样本。
> dput(importance_rf)
list(structure(list(importance = structure(list(Overall = c(100,
32.9191368970689, 0, 29.4889011862606, 24.8664587940577, 21.8746288172869,
21.7051171149606, 20.0868919191658, 20.3678665772965, 20.2873319598582,
33.7597621482843, 42.1891066454062, 22.7027798691687, 17.0766042463516,
39.4559095867264, 17.9431725056776, 23.2881573588367, 5.04721532342669,
22.3290849893345, 20.7266835722104, 21.5723519894789, 19.5211504808207,
21.2794742178794, 20.1624361665348, 13.7420140365184, 31.7941409073075,
20.9409991203303, 30.4229311296897, 11.5187371425859, 12.8487688047673,
9.40749461290917, 10.361793419014, 32.5677389075859, 26.5411449178312,
23.3996095888034, 2.84823906954271, 10.0257295515002, 2.27406632480383,
0.221285401034356, 0.844517489791465, 1.97286969198767, 0.0909347758420391,
0.541007254389242, 0.359718315763083, 1.26912866459011, 0.158954429130366,
0.245159217854806, 1.43768928047267, 0.796627703857018, 0.0731764363395144,
1.72357935713514, 0.424562470997031, 3.38312715168264, 1.88770244332681,
0.0314985706869475, 0, 0.65427952713802, 0, 0.0171557103229226,
0.709743254593806, 1.13539938842206, 0.0367104133426984, 2.95211595985093,
0, 0.582868854914444, 0.393813676879418, 1.15732422255054, 2.24940561099934,
1.73472209382337, 1.34428847541862, 1.15486784386305, 0, 0.689216959226089,
0.625678629482648, 1.81161997423301, 0.433030827900777, 10.9106578268112,
2.24295278032112, 18.176936900799, 1.74711580562318, 1.45310012173878,
0.952143653091356, 1.16652405720194, 1.11866015943186, 2.68527336222893,
1.12853921993574, 5.10727247259446, 1.93994049536545, 1.36475795626174,
2.95717137358439, 0.115367165512589, 0, 1.45815337045876, 0,
1.78943634306828, 5.71749991297189, 2.43536004133198, 1.27231795918686,
11.4771984230702, 3.0971032186365, 0.708058471655881, 0.170261025718881,
3.37435307537382, 1.56044494248123, 1.09294450754124, 0, 2.25592933845801,
2.30276525800757, 1.86149986210819, 1.46145976307003, 1.26858067553346,
2.11041986636824, 0.0902116364175813, 1.54299863875175, 0, 0.269632340125967,
1.88548693593634, 4.47233507072462, 0.66752451890319)), class = "data.frame", row.names = c("age",
"Weight", "HookhConsumption", "BMI", "SystolicBP", "RR", "DiastolicBP",
"ALP", "ALT", "AST", "Albumin", "BS", "CPK", "CRP", "Calcium",
"Creatinine", "Ddimer", "Directbilirubin", "ESR", "FBS", "Ferritin",
"HB", "LDH", "Lymphocyte", "Mg", "Neutrophyl", "PLT", "PR", "PhosphorP",
"PotassiumK", "SodiumNA", "Totalbilirubin", "Urea", "WBC", "EjectionFraction",
"TotalLungInvolvementRank", "TotalLungInvolvementPercent", "sex2",
"Type.of.heart.disease1", "Type.of.heart.disease2", "Type.of.heart.disease9",
"Unilateral.paralysis1", "Ulcers1", "Obesity.BMI.above.351",
"Peripheral.artery.disease1", "organ.involment.from.diabetes1",
"organ.involment.from.diabetes2", "organ.involment.from.diabetes3",
"UsingDrugHistory1", "UsingAlcoholHistory1", "Transplantation1",
"SeverityofKidneyDisease1", "SeverityofKidneyDisease2", "SeverityofKidneyDisease3",
"SeverityChronicliverdisease1", "SeverityChronicliverdisease2",
"SeverityChronicliverdisease3", "SeverityChronicliverdisease4",
"SeverityChronicliverdisease9", "Schizophrenia1", "Rheumatologicaldiseases1",
"Pregnant1", "Neurologicaldiseases1", "LiverTransplantation1",
"KidneyTransplantation1", "Immunedeficiencydisease1", "Hypothyroidism1",
"Hypertention1", "Hyperlipidemia1", "Historyofsmoking1", "HistoryofHookah1",
"HeartTransplantation1", "HIV1", "FattyLiver1", "Diabetes1",
"Chronicliverdisease1", "Chronickidneydisease1", "CardiovascularDisease1",
"Cancers1", "CVAStrokeCVDTIA1", "COPD1", "Asthma1", "WetCough1",
"WeightLoss1", "WeaknessandLethargy1", "Vomit1", "Trembling1",
"Sweating1", "Sputum1", "Sorethroat1", "SkinRush1", "Rush1",
"Rhinorrhea1", "PharynxExoda1", "Nausea1", "Muscle_Painmyalgia1",
"Lossofsenseoftaste1", "Lossofsenseofsmell1", "LossofConsciousness1",
"LimbEdema1", "Jointpain_Arthralgia1", "Hemoptysis1", "Headace1",
"Fever1", "Fatigue1", "EyeConjunctivitis1", "Epigastric1", "Dyspnea1",
"DryCough1", "Dizziness1", "Diarrhea1", "Chestpain1", "CardiacArrhythmia1",
"Body_Pain1", "Bleeding1", "Ataxia1", "Anorexia1", "PCRCOVID19Test1",
"PCRCOVID19Test2")), model = "rf", calledFrom = "varImp"), class = "varImp.train"),
structure(list(importance = structure(list(Overall = c(100,
36.8463357663146, 0, 20.5921448468941, 35.0980630859042,
15.7098956910968, 27.5542325637653, 22.3935810225052, 25.6062709809081,
18.9072078537409, 30.5428709528983, 26.4061314161858, 27.2933977255992,
18.3744993875278, 57.5115149169245, 14.4361277134982, 49.9265957132235,
6.10831602661626, 28.2527379885906, 23.0147565449908, 32.7997892888894,
22.7055707536584, 36.9763807158356, 28.9941599048441, 17.8186386653819,
31.2682240107287, 26.2894098494535, 41.1751827476675, 22.6316241605114,
16.9314172346857, 14.4927913128733, 13.1792980470757, 44.2836496383372,
32.7246002717468, 30.3912750391576, 10.0409713536124, 9.83444013035946,
2.50470824612248, 1.72055335723373, 1.05083165735798, 1.56193393834476,
0.233521622728958, 1.08064736921506, 0.555709266569136, 2.40106539585553,
0.291833555475466, 0.380999891346632, 2.56592221397732, 1.62107348934456,
0.504647559430998, 1.19859835755469, 0, 1.4382135880929,
1.94514657535966, 0, 0.0569205442253742, 0.44589056596685,
0.0539230755197555, 0, 0.055077983652405, 1.24527213390211,
0, 1.36267778294481, 0.151259347248717, 0.499919817645286,
0, 2.79981213016671, 2.72663427247346, 1.93725253183476,
2.70715099933653, 1.99722906280419, 0, 0.111342938271961,
1.2426657762317, 2.15186257620788, 0.584084013981451, 9.87542370836023,
3.21493418783175, 14.6556614893423, 0.67462103889104, 0.787088521176588,
2.61946726039402, 2.8099384934716, 0.377053883833586, 2.2824838493133,
1.12217532020233, 3.44210364347885, 2.61343827037804, 9.58864870521531,
1.77823199575717, 0, 0, 0.828679129518211, 0, 2.73842874693014,
14.5506870851474, 0.390367251047195, 0.811902694072225, 15.5803912323052,
4.18258978600944, 2.13546475796113, 2.66088800284236, 2.97761832225233,
3.54039994200135, 2.44519084017892, 0.737528372419208, 2.20708600548186,
4.12502178170407, 3.1835668678093, 7.61195991815971, 2.35303302862437,
5.70342032074721, 0.409606955773683, 2.4977310780031, 0.0107020031498121,
0.268000372472171, 2.32396173268619, 1.64515893404575, 0.868523484401606
)), class = "data.frame", row.names = c("age", "Weight",
"HookhConsumption", "BMI", "SystolicBP", "RR", "DiastolicBP",
"ALP", "ALT", "AST", "Albumin", "BS", "CPK", "CRP", "Calcium",
"Creatinine", "Ddimer", "Directbilirubin", "ESR", "FBS",
"Ferritin", "HB", "LDH", "Lymphocyte", "Mg", "Neutrophyl",
"PLT", "PR", "PhosphorP", "PotassiumK", "SodiumNA", "Totalbilirubin",
"Urea", "WBC", "EjectionFraction", "TotalLungInvolvementRank",
"TotalLungInvolvementPercent", "sex2", "Type.of.heart.disease1",
"Type.of.heart.disease2", "Type.of.heart.disease9", "Unilateral.paralysis1",
"Ulcers1", "Obesity.BMI.above.351", "Peripheral.artery.disease1",
"organ.involment.from.diabetes1", "organ.involment.from.diabetes2",
"organ.involment.from.diabetes3", "UsingDrugHistory1", "UsingAlcoholHistory1",
"Transplantation1", "SeverityofKidneyDisease1", "SeverityofKidneyDisease2",
"SeverityofKidneyDisease3", "SeverityChronicliverdisease1",
"SeverityChronicliverdisease2", "SeverityChronicliverdisease3",
"SeverityChronicliverdisease4", "SeverityChronicliverdisease9",
"Schizophrenia1", "Rheumatologicaldiseases1", "Pregnant1",
"Neurologicaldiseases1", "LiverTransplantation1", "KidneyTransplantation1",
"Immunedeficiencydisease1", "Hypothyroidism1", "Hypertention1",
"Hyperlipidemia1", "Historyofsmoking1", "HistoryofHookah1",
"HeartTransplantation1", "HIV1", "FattyLiver1", "Diabetes1",
"Chronicliverdisease1", "Chronickidneydisease1", "CardiovascularDisease1",
"Cancers1", "CVAStrokeCVDTIA1", "COPD1", "Asthma1", "WetCough1",
"WeightLoss1", "WeaknessandLethargy1", "Vomit1", "Trembling1",
"Sweating1", "Sputum1", "Sorethroat1", "SkinRush1", "Rush1",
"Rhinorrhea1", "PharynxExoda1", "Nausea1", "Muscle_Painmyalgia1",
"Lossofsenseoftaste1", "Lossofsenseofsmell1", "LossofConsciousness1",
"LimbEdema1", "Jointpain_Arthralgia1", "Hemoptysis1", "Headace1",
"Fever1", "Fatigue1", "EyeConjunctivitis1", "Epigastric1",
"Dyspnea1", "DryCough1", "Dizziness1", "Diarrhea1", "Chestpain1",
"CardiacArrhythmia1", "Body_Pain1", "Bleeding1", "Ataxia1",
"Anorexia1", "PCRCOVID19Test1", "PCRCOVID19Test2")), model = "rf",
calledFrom = "varImp"), class = "varImp.train"), structure(list(
importance = structure(list(Overall = c(100, 36.4519408382731,
0.0121282468302786, 27.9982404793903, 19.4487163883379,
24.6079653972917, 14.1539998143239, 18.684018340339,
20.1182663550791, 17.4200861293186, 46.6309831468223,
52.2217679510578, 28.5910698857479, 16.845796014194,
31.6509235655573, 17.1000574614637, 27.8424176478161,
5.69845064904499, 21.3838903337718, 20.217605303817,
19.8702958841878, 22.3737582989512, 33.0788664305301,
20.6035947546629, 16.3220426343042, 23.4809287675538,
23.1749036748423, 57.122094059206, 12.2409421568247,
11.234114301956, 15.7946508155502, 8.80563729211453,
20.2205078755919, 20.3091908316546, 27.7497357152039,
3.8622908315769, 12.8894291926347, 5.96701805516155,
0.761922263853243, 1.41991036581607, 1.54560737492769,
0.825161722105208, 0.0172016746252156, 0.693982409239905,
0, 0.358366468201754, 1.74812586771487, 2.2746344067366,
0.745595100629448, 0.465199425668223, 0.408092232849501,
0.115358703965213, 0.0358338604150282, 2.88640197248697,
0, 0.288302498762889, 0.332551323637155, 0.0121282468302786,
0, 1.03515126482736, 1.1213600137207, 0.329413397366096,
2.0612368962315, 0, 0.610994615626186, 1.0215655608971,
3.90651448858199, 1.73374217783332, 1.47244358073369,
2.20534241559288, 0.173681720638885, 0, 0.631950099628902,
0.132328128708788, 2.92435478031454, 1.03537122788376,
4.74067414123091, 1.77981701502525, 13.1150432121738,
0.720556880972878, 1.20366662244445, 1.19169376389038,
1.86442992849398, 0.518200723424615, 2.278501378269,
1.23638371282217, 3.66947066761794, 2.03933409738165,
1.25289331603719, 1.01627904400807, 0.0324453169731015,
0, 2.29817177168672, 0, 1.53194610140319, 7.15322639329996,
0.759542631415349, 1.53353473284619, 4.77390474517756,
1.05656481042379, 0.699450154375729, 1.16224285818854,
3.65223350861514, 1.93274707207956, 1.57589588221639,
0.449432695377871, 1.36863730886437, 2.11275137384133,
3.29450357362525, 1.08676677214028, 2.18565092410049,
1.15456248328987, 0.492245547306216, 1.59592156033113,
0.0129367966189638, 0.514499765305734, 1.58591810753971,
1.84832826238423, 0.807564130566264)), class = "data.frame", row.names = c("age",
"Weight", "HookhConsumption", "BMI", "SystolicBP", "RR",
"DiastolicBP", "ALP", "ALT", "AST", "Albumin", "BS",
"CPK", "CRP", "Calcium", "Creatinine", "Ddimer", "Directbilirubin",
"ESR", "FBS", "Ferritin", "HB", "LDH", "Lymphocyte",
"Mg", "Neutrophyl", "PLT", "PR", "PhosphorP", "PotassiumK",
"SodiumNA", "Totalbilirubin", "Urea", "WBC", "EjectionFraction",
"TotalLungInvolvementRank", "TotalLungInvolvementPercent",
"sex2", "Type.of.heart.disease1", "Type.of.heart.disease2",
"Type.of.heart.disease9", "Unilateral.paralysis1", "Ulcers1",
"Obesity.BMI.above.351", "Peripheral.artery.disease1",
"organ.involment.from.diabetes1", "organ.involment.from.diabetes2",
"organ.involment.from.diabetes3", "UsingDrugHistory1",
"UsingAlcoholHistory1", "Transplantation1", "SeverityofKidneyDisease1",
"SeverityofKidneyDisease2", "SeverityofKidneyDisease3",
"SeverityChronicliverdisease1", "SeverityChronicliverdisease2",
"SeverityChronicliverdisease3", "SeverityChronicliverdisease4",
"SeverityChronicliverdisease9", "Schizophrenia1", "Rheumatologicaldiseases1",
"Pregnant1", "Neurologicaldiseases1", "LiverTransplantation1",
"KidneyTransplantation1", "Immunedeficiencydisease1",
"Hypothyroidism1", "Hypertention1", "Hyperlipidemia1",
"Historyofsmoking1", "HistoryofHookah1", "HeartTransplantation1",
"HIV1", "FattyLiver1", "Diabetes1", "Chronicliverdisease1",
"Chronickidneydisease1", "CardiovascularDisease1", "Cancers1",
"CVAStrokeCVDTIA1", "COPD1", "Asthma1", "WetCough1",
"WeightLoss1", "WeaknessandLethargy1", "Vomit1", "Trembling1",
"Sweating1", "Sputum1", "Sorethroat1", "SkinRush1", "Rush1",
"Rhinorrhea1", "PharynxExoda1", "Nausea1", "Muscle_Painmyalgia1",
"Lossofsenseoftaste1", "Lossofsenseofsmell1", "LossofConsciousness1",
"LimbEdema1", "Jointpain_Arthralgia1", "Hemoptysis1",
"Headace1", "Fever1", "Fatigue1", "EyeConjunctivitis1",
"Epigastric1", "Dyspnea1", "DryCough1", "Dizziness1",
"Diarrhea1", "Chestpain1", "CardiacArrhythmia1", "Body_Pain1",
"Bleeding1", "Ataxia1", "Anorexia1", "PCRCOVID19Test1",
"PCRCOVID19Test2")), model = "rf", calledFrom = "varImp"), class = "varImp.train"))
I'm training the random forest algorithm three times and saving the variables' importance into the list ( using the caret package). how can I calculate the mean of each feature if it exists?
for example, how can I calculate the mean of three overall "ESR"? ( I am going to train this algorithm a thousand times )
these are my example :
[[1]]
rf variable importance
only 20 most important variables shown (out of 119)
Overall
Albumin 100.00
age 97.36
PR 60.18
RR 42.41
Weight 35.26
SystolicBP 32.14
Cancers1 29.79
ESR 27.66
Neutrophyl 26.98
CPK 25.68
EjectionFraction 25.59
BMI 24.42
Calcium 23.87
WBC 22.36
Urea 22.01
LDH 21.23
FBS 20.21
Ddimer 19.32
HB 18.99
Lymphocyte 18.78
[[2]]
rf variable importance
only 20 most important variables shown (out of 119)
Overall
age 100.00
FBS 57.80
WBC 53.88
PR 53.84
Neutrophyl 53.52
Weight 52.31
HB 51.69
LDH 50.15
Urea 49.31
Albumin 47.05
Lymphocyte 46.87
CPK 46.54
SystolicBP 45.64
Calcium 44.87
ESR 43.54
Ferritin 43.03
CRP 43.00
PLT 42.83
Creatinine 42.53
EjectionFraction 41.43
[[3]]
rf variable importance
only 20 most important variables shown (out of 119)
Overall
age 100.00
Albumin 43.41
Weight 24.88
FBS 24.63
BS 23.31
PR 21.47
LDH 21.06
Neutrophyl 20.68
BMI 17.94
EjectionFraction 17.29
CPK 16.49
WBC 16.11
ALP 15.72
RR 15.28
Lymphocyte 14.94
Cancers1 14.68
CRP 14.50
ESR 14.38
Ddimer 13.05
Ferritin 12.96
can I create a data frame that saves the features and their overall?
thanks for helping
this is my code :
prediction_value_rf=list()
importance_rf=list()
auc_rf=list()
weight_rf=list()
for ( i in 1:1000){
resample_death <- death[sample(nrow(death), size=300), ]
resample_alive <-alive[sample(nrow(alive), size=300), ]
f_dataset=rbind(resample_alive,resample_death)
inx <- sample.split(seq_len(nrow(f_dataset)), 0.25)
trainData<- f_dataset[!inx, ]
testData <- f_dataset[inx, ]
rf_fit <- train(vital_status ~ .,
data = trainData,
method = "rf",
)
pred=predict(rf_fit, testData[,-109])
pred1=predict(rf_fit, testData[,-109],type='prob')
prediction_value_rf[[i]]=pred1[2]
auc=auc(testData$vital_status,as.numeric(pred1[[2]]),direction="<", levels = levels(testData$vital_status))
auc_rf[[i]]=auc
a=varImp(rf_fit,scale = TRUE)
importance_rf[[i]] <- a
weight_rf[[i]]=max(rf_fit$results$Accuracy)
}
in the end, I want to calculate the mean of all overall features (wanna create ensemble model ) .
my dataset contain 109 feature and 4200 sample .
> dput(importance_rf)
list(structure(list(importance = structure(list(Overall = c(100,
32.9191368970689, 0, 29.4889011862606, 24.8664587940577, 21.8746288172869,
21.7051171149606, 20.0868919191658, 20.3678665772965, 20.2873319598582,
33.7597621482843, 42.1891066454062, 22.7027798691687, 17.0766042463516,
39.4559095867264, 17.9431725056776, 23.2881573588367, 5.04721532342669,
22.3290849893345, 20.7266835722104, 21.5723519894789, 19.5211504808207,
21.2794742178794, 20.1624361665348, 13.7420140365184, 31.7941409073075,
20.9409991203303, 30.4229311296897, 11.5187371425859, 12.8487688047673,
9.40749461290917, 10.361793419014, 32.5677389075859, 26.5411449178312,
23.3996095888034, 2.84823906954271, 10.0257295515002, 2.27406632480383,
0.221285401034356, 0.844517489791465, 1.97286969198767, 0.0909347758420391,
0.541007254389242, 0.359718315763083, 1.26912866459011, 0.158954429130366,
0.245159217854806, 1.43768928047267, 0.796627703857018, 0.0731764363395144,
1.72357935713514, 0.424562470997031, 3.38312715168264, 1.88770244332681,
0.0314985706869475, 0, 0.65427952713802, 0, 0.0171557103229226,
0.709743254593806, 1.13539938842206, 0.0367104133426984, 2.95211595985093,
0, 0.582868854914444, 0.393813676879418, 1.15732422255054, 2.24940561099934,
1.73472209382337, 1.34428847541862, 1.15486784386305, 0, 0.689216959226089,
0.625678629482648, 1.81161997423301, 0.433030827900777, 10.9106578268112,
2.24295278032112, 18.176936900799, 1.74711580562318, 1.45310012173878,
0.952143653091356, 1.16652405720194, 1.11866015943186, 2.68527336222893,
1.12853921993574, 5.10727247259446, 1.93994049536545, 1.36475795626174,
2.95717137358439, 0.115367165512589, 0, 1.45815337045876, 0,
1.78943634306828, 5.71749991297189, 2.43536004133198, 1.27231795918686,
11.4771984230702, 3.0971032186365, 0.708058471655881, 0.170261025718881,
3.37435307537382, 1.56044494248123, 1.09294450754124, 0, 2.25592933845801,
2.30276525800757, 1.86149986210819, 1.46145976307003, 1.26858067553346,
2.11041986636824, 0.0902116364175813, 1.54299863875175, 0, 0.269632340125967,
1.88548693593634, 4.47233507072462, 0.66752451890319)), class = "data.frame", row.names = c("age",
"Weight", "HookhConsumption", "BMI", "SystolicBP", "RR", "DiastolicBP",
"ALP", "ALT", "AST", "Albumin", "BS", "CPK", "CRP", "Calcium",
"Creatinine", "Ddimer", "Directbilirubin", "ESR", "FBS", "Ferritin",
"HB", "LDH", "Lymphocyte", "Mg", "Neutrophyl", "PLT", "PR", "PhosphorP",
"PotassiumK", "SodiumNA", "Totalbilirubin", "Urea", "WBC", "EjectionFraction",
"TotalLungInvolvementRank", "TotalLungInvolvementPercent", "sex2",
"Type.of.heart.disease1", "Type.of.heart.disease2", "Type.of.heart.disease9",
"Unilateral.paralysis1", "Ulcers1", "Obesity.BMI.above.351",
"Peripheral.artery.disease1", "organ.involment.from.diabetes1",
"organ.involment.from.diabetes2", "organ.involment.from.diabetes3",
"UsingDrugHistory1", "UsingAlcoholHistory1", "Transplantation1",
"SeverityofKidneyDisease1", "SeverityofKidneyDisease2", "SeverityofKidneyDisease3",
"SeverityChronicliverdisease1", "SeverityChronicliverdisease2",
"SeverityChronicliverdisease3", "SeverityChronicliverdisease4",
"SeverityChronicliverdisease9", "Schizophrenia1", "Rheumatologicaldiseases1",
"Pregnant1", "Neurologicaldiseases1", "LiverTransplantation1",
"KidneyTransplantation1", "Immunedeficiencydisease1", "Hypothyroidism1",
"Hypertention1", "Hyperlipidemia1", "Historyofsmoking1", "HistoryofHookah1",
"HeartTransplantation1", "HIV1", "FattyLiver1", "Diabetes1",
"Chronicliverdisease1", "Chronickidneydisease1", "CardiovascularDisease1",
"Cancers1", "CVAStrokeCVDTIA1", "COPD1", "Asthma1", "WetCough1",
"WeightLoss1", "WeaknessandLethargy1", "Vomit1", "Trembling1",
"Sweating1", "Sputum1", "Sorethroat1", "SkinRush1", "Rush1",
"Rhinorrhea1", "PharynxExoda1", "Nausea1", "Muscle_Painmyalgia1",
"Lossofsenseoftaste1", "Lossofsenseofsmell1", "LossofConsciousness1",
"LimbEdema1", "Jointpain_Arthralgia1", "Hemoptysis1", "Headace1",
"Fever1", "Fatigue1", "EyeConjunctivitis1", "Epigastric1", "Dyspnea1",
"DryCough1", "Dizziness1", "Diarrhea1", "Chestpain1", "CardiacArrhythmia1",
"Body_Pain1", "Bleeding1", "Ataxia1", "Anorexia1", "PCRCOVID19Test1",
"PCRCOVID19Test2")), model = "rf", calledFrom = "varImp"), class = "varImp.train"),
structure(list(importance = structure(list(Overall = c(100,
36.8463357663146, 0, 20.5921448468941, 35.0980630859042,
15.7098956910968, 27.5542325637653, 22.3935810225052, 25.6062709809081,
18.9072078537409, 30.5428709528983, 26.4061314161858, 27.2933977255992,
18.3744993875278, 57.5115149169245, 14.4361277134982, 49.9265957132235,
6.10831602661626, 28.2527379885906, 23.0147565449908, 32.7997892888894,
22.7055707536584, 36.9763807158356, 28.9941599048441, 17.8186386653819,
31.2682240107287, 26.2894098494535, 41.1751827476675, 22.6316241605114,
16.9314172346857, 14.4927913128733, 13.1792980470757, 44.2836496383372,
32.7246002717468, 30.3912750391576, 10.0409713536124, 9.83444013035946,
2.50470824612248, 1.72055335723373, 1.05083165735798, 1.56193393834476,
0.233521622728958, 1.08064736921506, 0.555709266569136, 2.40106539585553,
0.291833555475466, 0.380999891346632, 2.56592221397732, 1.62107348934456,
0.504647559430998, 1.19859835755469, 0, 1.4382135880929,
1.94514657535966, 0, 0.0569205442253742, 0.44589056596685,
0.0539230755197555, 0, 0.055077983652405, 1.24527213390211,
0, 1.36267778294481, 0.151259347248717, 0.499919817645286,
0, 2.79981213016671, 2.72663427247346, 1.93725253183476,
2.70715099933653, 1.99722906280419, 0, 0.111342938271961,
1.2426657762317, 2.15186257620788, 0.584084013981451, 9.87542370836023,
3.21493418783175, 14.6556614893423, 0.67462103889104, 0.787088521176588,
2.61946726039402, 2.8099384934716, 0.377053883833586, 2.2824838493133,
1.12217532020233, 3.44210364347885, 2.61343827037804, 9.58864870521531,
1.77823199575717, 0, 0, 0.828679129518211, 0, 2.73842874693014,
14.5506870851474, 0.390367251047195, 0.811902694072225, 15.5803912323052,
4.18258978600944, 2.13546475796113, 2.66088800284236, 2.97761832225233,
3.54039994200135, 2.44519084017892, 0.737528372419208, 2.20708600548186,
4.12502178170407, 3.1835668678093, 7.61195991815971, 2.35303302862437,
5.70342032074721, 0.409606955773683, 2.4977310780031, 0.0107020031498121,
0.268000372472171, 2.32396173268619, 1.64515893404575, 0.868523484401606
)), class = "data.frame", row.names = c("age", "Weight",
"HookhConsumption", "BMI", "SystolicBP", "RR", "DiastolicBP",
"ALP", "ALT", "AST", "Albumin", "BS", "CPK", "CRP", "Calcium",
"Creatinine", "Ddimer", "Directbilirubin", "ESR", "FBS",
"Ferritin", "HB", "LDH", "Lymphocyte", "Mg", "Neutrophyl",
"PLT", "PR", "PhosphorP", "PotassiumK", "SodiumNA", "Totalbilirubin",
"Urea", "WBC", "EjectionFraction", "TotalLungInvolvementRank",
"TotalLungInvolvementPercent", "sex2", "Type.of.heart.disease1",
"Type.of.heart.disease2", "Type.of.heart.disease9", "Unilateral.paralysis1",
"Ulcers1", "Obesity.BMI.above.351", "Peripheral.artery.disease1",
"organ.involment.from.diabetes1", "organ.involment.from.diabetes2",
"organ.involment.from.diabetes3", "UsingDrugHistory1", "UsingAlcoholHistory1",
"Transplantation1", "SeverityofKidneyDisease1", "SeverityofKidneyDisease2",
"SeverityofKidneyDisease3", "SeverityChronicliverdisease1",
"SeverityChronicliverdisease2", "SeverityChronicliverdisease3",
"SeverityChronicliverdisease4", "SeverityChronicliverdisease9",
"Schizophrenia1", "Rheumatologicaldiseases1", "Pregnant1",
"Neurologicaldiseases1", "LiverTransplantation1", "KidneyTransplantation1",
"Immunedeficiencydisease1", "Hypothyroidism1", "Hypertention1",
"Hyperlipidemia1", "Historyofsmoking1", "HistoryofHookah1",
"HeartTransplantation1", "HIV1", "FattyLiver1", "Diabetes1",
"Chronicliverdisease1", "Chronickidneydisease1", "CardiovascularDisease1",
"Cancers1", "CVAStrokeCVDTIA1", "COPD1", "Asthma1", "WetCough1",
"WeightLoss1", "WeaknessandLethargy1", "Vomit1", "Trembling1",
"Sweating1", "Sputum1", "Sorethroat1", "SkinRush1", "Rush1",
"Rhinorrhea1", "PharynxExoda1", "Nausea1", "Muscle_Painmyalgia1",
"Lossofsenseoftaste1", "Lossofsenseofsmell1", "LossofConsciousness1",
"LimbEdema1", "Jointpain_Arthralgia1", "Hemoptysis1", "Headace1",
"Fever1", "Fatigue1", "EyeConjunctivitis1", "Epigastric1",
"Dyspnea1", "DryCough1", "Dizziness1", "Diarrhea1", "Chestpain1",
"CardiacArrhythmia1", "Body_Pain1", "Bleeding1", "Ataxia1",
"Anorexia1", "PCRCOVID19Test1", "PCRCOVID19Test2")), model = "rf",
calledFrom = "varImp"), class = "varImp.train"), structure(list(
importance = structure(list(Overall = c(100, 36.4519408382731,
0.0121282468302786, 27.9982404793903, 19.4487163883379,
24.6079653972917, 14.1539998143239, 18.684018340339,
20.1182663550791, 17.4200861293186, 46.6309831468223,
52.2217679510578, 28.5910698857479, 16.845796014194,
31.6509235655573, 17.1000574614637, 27.8424176478161,
5.69845064904499, 21.3838903337718, 20.217605303817,
19.8702958841878, 22.3737582989512, 33.0788664305301,
20.6035947546629, 16.3220426343042, 23.4809287675538,
23.1749036748423, 57.122094059206, 12.2409421568247,
11.234114301956, 15.7946508155502, 8.80563729211453,
20.2205078755919, 20.3091908316546, 27.7497357152039,
3.8622908315769, 12.8894291926347, 5.96701805516155,
0.761922263853243, 1.41991036581607, 1.54560737492769,
0.825161722105208, 0.0172016746252156, 0.693982409239905,
0, 0.358366468201754, 1.74812586771487, 2.2746344067366,
0.745595100629448, 0.465199425668223, 0.408092232849501,
0.115358703965213, 0.0358338604150282, 2.88640197248697,
0, 0.288302498762889, 0.332551323637155, 0.0121282468302786,
0, 1.03515126482736, 1.1213600137207, 0.329413397366096,
2.0612368962315, 0, 0.610994615626186, 1.0215655608971,
3.90651448858199, 1.73374217783332, 1.47244358073369,
2.20534241559288, 0.173681720638885, 0, 0.631950099628902,
0.132328128708788, 2.92435478031454, 1.03537122788376,
4.74067414123091, 1.77981701502525, 13.1150432121738,
0.720556880972878, 1.20366662244445, 1.19169376389038,
1.86442992849398, 0.518200723424615, 2.278501378269,
1.23638371282217, 3.66947066761794, 2.03933409738165,
1.25289331603719, 1.01627904400807, 0.0324453169731015,
0, 2.29817177168672, 0, 1.53194610140319, 7.15322639329996,
0.759542631415349, 1.53353473284619, 4.77390474517756,
1.05656481042379, 0.699450154375729, 1.16224285818854,
3.65223350861514, 1.93274707207956, 1.57589588221639,
0.449432695377871, 1.36863730886437, 2.11275137384133,
3.29450357362525, 1.08676677214028, 2.18565092410049,
1.15456248328987, 0.492245547306216, 1.59592156033113,
0.0129367966189638, 0.514499765305734, 1.58591810753971,
1.84832826238423, 0.807564130566264)), class = "data.frame", row.names = c("age",
"Weight", "HookhConsumption", "BMI", "SystolicBP", "RR",
"DiastolicBP", "ALP", "ALT", "AST", "Albumin", "BS",
"CPK", "CRP", "Calcium", "Creatinine", "Ddimer", "Directbilirubin",
"ESR", "FBS", "Ferritin", "HB", "LDH", "Lymphocyte",
"Mg", "Neutrophyl", "PLT", "PR", "PhosphorP", "PotassiumK",
"SodiumNA", "Totalbilirubin", "Urea", "WBC", "EjectionFraction",
"TotalLungInvolvementRank", "TotalLungInvolvementPercent",
"sex2", "Type.of.heart.disease1", "Type.of.heart.disease2",
"Type.of.heart.disease9", "Unilateral.paralysis1", "Ulcers1",
"Obesity.BMI.above.351", "Peripheral.artery.disease1",
"organ.involment.from.diabetes1", "organ.involment.from.diabetes2",
"organ.involment.from.diabetes3", "UsingDrugHistory1",
"UsingAlcoholHistory1", "Transplantation1", "SeverityofKidneyDisease1",
"SeverityofKidneyDisease2", "SeverityofKidneyDisease3",
"SeverityChronicliverdisease1", "SeverityChronicliverdisease2",
"SeverityChronicliverdisease3", "SeverityChronicliverdisease4",
"SeverityChronicliverdisease9", "Schizophrenia1", "Rheumatologicaldiseases1",
"Pregnant1", "Neurologicaldiseases1", "LiverTransplantation1",
"KidneyTransplantation1", "Immunedeficiencydisease1",
"Hypothyroidism1", "Hypertention1", "Hyperlipidemia1",
"Historyofsmoking1", "HistoryofHookah1", "HeartTransplantation1",
"HIV1", "FattyLiver1", "Diabetes1", "Chronicliverdisease1",
"Chronickidneydisease1", "CardiovascularDisease1", "Cancers1",
"CVAStrokeCVDTIA1", "COPD1", "Asthma1", "WetCough1",
"WeightLoss1", "WeaknessandLethargy1", "Vomit1", "Trembling1",
"Sweating1", "Sputum1", "Sorethroat1", "SkinRush1", "Rush1",
"Rhinorrhea1", "PharynxExoda1", "Nausea1", "Muscle_Painmyalgia1",
"Lossofsenseoftaste1", "Lossofsenseofsmell1", "LossofConsciousness1",
"LimbEdema1", "Jointpain_Arthralgia1", "Hemoptysis1",
"Headace1", "Fever1", "Fatigue1", "EyeConjunctivitis1",
"Epigastric1", "Dyspnea1", "DryCough1", "Dizziness1",
"Diarrhea1", "Chestpain1", "CardiacArrhythmia1", "Body_Pain1",
"Bleeding1", "Ataxia1", "Anorexia1", "PCRCOVID19Test1",
"PCRCOVID19Test2")), model = "rf", calledFrom = "varImp"), class = "varImp.train"))
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
对于这部分:
因为您已经生成了列表,所以您可以创建一个函数,以选择包含功能名称的行,然后将此函数应用于列表的每个元素,然后将其弄平,然后计算均值。如果在某些元素中不存在该功能,则可以使用
na.rm
将其排除在平均计算之外。例如,此类似于您的列表:
以下是如何计算
eSr
的平均值,在所有元素中都存在,crp
在其中一个元素中不存在:因为您有许多功能,可以创建另一个功能将此步骤应用于每个功能。例如:
编辑
上一个示例中使用的合成数据
myList
,在其每个元素中仅包含一个“总体”数据框架对象,因此提取的提取功能可以使用lapply
直接应用于数据。但是,您在更新的问题中提供的实际数据,eximples_rf
在其每个元素中都有多个对象,而“总体”数据框架对象是第一个元素。区别在于您在评论中显示的错误的原因。要应用提取,应首先使用lapply(function(list)list [[1]])
首先提取“总体”数据帧,然后可以应用上一个步骤。a {base r}方式
以下步骤可以应用于以下功能的向量:
关于代码的简短说明:
lapply(function(list) list [[1]])
在exigants_rf
列表中提取每个元素的第一个元素,该列表是包含功能数据的数据框架。dat[feature_name, "Overall"]
extract the value of a targeted feature,feature_name
, in each extracted data frame. Only one feature is extracted from each data frame in every step.UNLIST()
将提取功能的数据结构从列表转换为数字向量。setNames
为数字向量创建名称,以易于识别计算手段的功能。以这种方式使用的功能都属于
base r
类别。您无需安装任何外部软件包即可获取它们。
另一个选项是将基本功能的组合与
purrr
软件包中的其他功能结合在一起。a
{purrr}
方式这些步骤比上面的基础r中的步骤要短得多,但是每个步骤中所做的事情可能不太明显。
请注意,
地图
与lapply
和pluck(x,1,1)
与x [[1]] [[1]] [ [1]]
。关于代码的简短说明:
map(pluck(1,1))
提取数据帧,与lapply(function> lapply(function(list)list [[[[[[[[[[[[[[[[[[[[[]) 1]]))
。map(function(dat)set_names(dat [功能,],功能)))
提取功能列表,与dat [feature_name,“总体”,“总体”]
相似。有区别:
在上面的基本r方式中,从所有数据帧中提取每个功能,然后计算平均值,然后以相同的方式提取另一个功能。
以这种purrr的方式,从列表中的每个数据框架中提取所有目标功能,然后将功能组合在一起以通过使用
as.data.frame
成为新的数据框架,以便每行代表一个功能。然后,RowMeans
用于计算特征的所有值的平均值。请注意,您可以在
| |&gt;
管道之前检查每个步骤的结果。例如,eximentance_rf
将显示每个元素中的所有对象。exuntionals_rf |&gt;地图(Pluck(1,1))
将仅显示数据框对象。加权含量的更新
是一个简单的示例,说明如何计算列表中每个功能的加权平均值。假设您有此列表:
并假设列表中的L1,L2和L3的权重值以下:
为了计算A的加权均值,例如,您需要此计算:
此处可以通过将列表中的每个值乘以受尊敬的归一化重量来实现这一目标。在这种情况下,W1的归一化重量为
W1/(W1+W2+W3)
。要在R中执行以下步骤:
将这些模拟
权重
值应用于您的eximpals_rf
list和功能
在示例中,我们获取:For this part:
Because you have already generated the list, you can create a function that selects the row that contains the feature name, and then apply this function to each element of the list, and then flatten it, and then calculate the mean. In case in some element the feature doesn't exist, it can be excluded from mean calculation by using
na.rm
.For example, this resembles your list:
Here is how to calculate the mean of
ESR
, which exists in all elements andCRP
which does not exist in one of the elements:Because you have many features, you can create another function to apply this step to each feature. For example:
EDIT
The synthetic data used in the previous example,
mylist
, contains only one "Overall" data frame object in each of its elements, so that the extraction of the feature can be applied directly to the data usinglapply
. However, the actual data that you provided in the updated question,importance_rf
has more than one objects in each of its element, with the "Overall" data frame object is in the first element. The difference is the cause of the error you showed in the comment. To apply the extraction, the "Overall" data frames should be extracted first, usinglapply(function(list) list[[1]])
and then the previous steps can be applied.A {base R} way
The previous steps can be applied to a vector of features as follows:
A brief explanation about the code:
lapply(function(list) list[[1]])
extract the first element of each element inimportant_rf
list, which is the data frame that contains the features data.dat[feature_name, "Overall"]
extract the value of a targeted feature,feature_name
, in each extracted data frame. Only one feature is extracted from each data frame in every step.unlist()
converts the data structure of the extracted features, from a list to a numeric vector.setNames
create names for the numeric vector to make easy to identify the features of which the means are being calculated.The functions used in this way all belong to
base R
category.You don't need to install any external package to get them.
Another option is to use combinations of base R functions with other functions from
purrr
package.A
{purrr}
wayThese steps are much shorter than the ones in base R above, but what is done in each step might be less obvious.
Note that
map
is similar withlapply
andpluck(x,1,1)
is equivalent withx[[1]][[1]]
.A brief explanation about the code:
map(pluck(1,1))
extract the data frames, similar work withlapply(function(list) list[[1]])
above.map(function(dat) set_names(dat[features,], features))
extracts the list of features, similar withdat[feature_name, "Overall"]
above.There is a difference:
In base R way above, every feature is extracted from all data frames, and then the mean is calculated, and then another feature is extracted the same way.
In this purrr way, all the targeted features are extracted from each data frame in the list, and then the features are combined to become a new data frame by using
as.data.frame
so that each row represents a feature. Then,rowMeans
is used to calculate the mean values of all values of the features.Note that you can check the result of each step before
|>
pipe. For example,importance_rf
will show all objects in each element.importance_rf |> map(pluck(1,1))
will show only the data frame objects.Updates for including weighted means
Here is a simple example of how to calculate weighted means of each feature in your list. Suppose you have this list:
And suppose you have the following weight values for L1, L2, and L3 in the list:
To calculate the weighted means of a, for example, you need this calculation:
You can get this by multiplying each value of a in the list with the respected normalized weight. In this case, the normalized weight for w1 is
w1/(w1+w2+w3)
.To do these steps in R:
Applying these mock
weight
values to yourimportance_rf
list and thefeatures
in the example , we get: