如何计算列表中元素重要性的均值?

发布于 2025-01-27 01:24:04 字数 16712 浏览 1 评论 0原文

我正在培训三次随机森林算法,并将变量的重要性保存到列表中(使用插入式包装)。如果存在每个功能,如何计算每个功能的平均值? 例如,如何计算三个总“ ESR”的平均值? (我要训练该算法一千次) 这些是我的示例:

[[1]]
rf variable importance


  only 20 most important variables shown (out of 119)

                 Overall
Albumin           100.00
age                97.36
PR                 60.18
RR                 42.41
Weight             35.26
SystolicBP         32.14
Cancers1           29.79
ESR                27.66
Neutrophyl         26.98
CPK                25.68
EjectionFraction   25.59
BMI                24.42
Calcium            23.87
WBC                22.36
Urea               22.01
LDH                21.23
FBS                20.21
Ddimer             19.32
HB                 18.99
Lymphocyte         18.78

[[2]]
rf variable importance

  only 20 most important variables shown (out of 119)

                 Overall
age               100.00
FBS                57.80
WBC                53.88
PR                 53.84
Neutrophyl         53.52
Weight             52.31
HB                 51.69
LDH                50.15
Urea               49.31
Albumin            47.05
Lymphocyte         46.87
CPK                46.54
SystolicBP         45.64
Calcium            44.87
ESR                43.54
Ferritin           43.03
CRP                43.00
PLT                42.83
Creatinine         42.53
EjectionFraction   41.43
[[3]]
rf variable importance

  only 20 most important variables shown (out of 119)

                 Overall
age               100.00
Albumin            43.41
Weight             24.88
FBS                24.63
BS                 23.31
PR                 21.47
LDH                21.06
Neutrophyl         20.68
BMI                17.94
EjectionFraction   17.29
CPK                16.49
WBC                16.11
ALP                15.72
RR                 15.28
Lymphocyte         14.94
Cancers1           14.68
CRP                14.50
ESR                14.38
Ddimer             13.05
Ferritin           12.96

我可以创建一个保存功能及其整体的数据框架吗? 感谢您的帮助 这是我的代码:

prediction_value_rf=list()
importance_rf=list()
auc_rf=list()
weight_rf=list()
for ( i in 1:1000){
   resample_death <- death[sample(nrow(death), size=300), ]
   resample_alive <-alive[sample(nrow(alive), size=300), ]
   f_dataset=rbind(resample_alive,resample_death)
   inx <- sample.split(seq_len(nrow(f_dataset)), 0.25)
   trainData<- f_dataset[!inx, ]
   testData <-  f_dataset[inx, ]
   rf_fit <- train(vital_status ~ ., 
                   data = trainData, 
                   method = "rf",
   )
   pred=predict(rf_fit, testData[,-109])
   pred1=predict(rf_fit, testData[,-109],type='prob')
   prediction_value_rf[[i]]=pred1[2]
   auc=auc(testData$vital_status,as.numeric(pred1[[2]]),direction="<", levels = levels(testData$vital_status))
   auc_rf[[i]]=auc
   a=varImp(rf_fit,scale = TRUE)
   importance_rf[[i]] <- a
   weight_rf[[i]]=max(rf_fit$results$Accuracy)
}

最后,我想计算所有总体功能的平均值(想创建集合模型)。 我的数据集包含109个功能和4200个样本。

> dput(importance_rf)
list(structure(list(importance = structure(list(Overall = c(100, 
32.9191368970689, 0, 29.4889011862606, 24.8664587940577, 21.8746288172869, 
21.7051171149606, 20.0868919191658, 20.3678665772965, 20.2873319598582, 
33.7597621482843, 42.1891066454062, 22.7027798691687, 17.0766042463516, 
39.4559095867264, 17.9431725056776, 23.2881573588367, 5.04721532342669, 
22.3290849893345, 20.7266835722104, 21.5723519894789, 19.5211504808207, 
21.2794742178794, 20.1624361665348, 13.7420140365184, 31.7941409073075, 
20.9409991203303, 30.4229311296897, 11.5187371425859, 12.8487688047673, 
9.40749461290917, 10.361793419014, 32.5677389075859, 26.5411449178312, 
23.3996095888034, 2.84823906954271, 10.0257295515002, 2.27406632480383, 
0.221285401034356, 0.844517489791465, 1.97286969198767, 0.0909347758420391, 
0.541007254389242, 0.359718315763083, 1.26912866459011, 0.158954429130366, 
0.245159217854806, 1.43768928047267, 0.796627703857018, 0.0731764363395144, 
1.72357935713514, 0.424562470997031, 3.38312715168264, 1.88770244332681, 
0.0314985706869475, 0, 0.65427952713802, 0, 0.0171557103229226, 
0.709743254593806, 1.13539938842206, 0.0367104133426984, 2.95211595985093, 
0, 0.582868854914444, 0.393813676879418, 1.15732422255054, 2.24940561099934, 
1.73472209382337, 1.34428847541862, 1.15486784386305, 0, 0.689216959226089, 
0.625678629482648, 1.81161997423301, 0.433030827900777, 10.9106578268112, 
2.24295278032112, 18.176936900799, 1.74711580562318, 1.45310012173878, 
0.952143653091356, 1.16652405720194, 1.11866015943186, 2.68527336222893, 
1.12853921993574, 5.10727247259446, 1.93994049536545, 1.36475795626174, 
2.95717137358439, 0.115367165512589, 0, 1.45815337045876, 0, 
1.78943634306828, 5.71749991297189, 2.43536004133198, 1.27231795918686, 
11.4771984230702, 3.0971032186365, 0.708058471655881, 0.170261025718881, 
3.37435307537382, 1.56044494248123, 1.09294450754124, 0, 2.25592933845801, 
2.30276525800757, 1.86149986210819, 1.46145976307003, 1.26858067553346, 
2.11041986636824, 0.0902116364175813, 1.54299863875175, 0, 0.269632340125967, 
1.88548693593634, 4.47233507072462, 0.66752451890319)), class = "data.frame", row.names = c("age", 
"Weight", "HookhConsumption", "BMI", "SystolicBP", "RR", "DiastolicBP", 
"ALP", "ALT", "AST", "Albumin", "BS", "CPK", "CRP", "Calcium", 
"Creatinine", "Ddimer", "Directbilirubin", "ESR", "FBS", "Ferritin", 
"HB", "LDH", "Lymphocyte", "Mg", "Neutrophyl", "PLT", "PR", "PhosphorP", 
"PotassiumK", "SodiumNA", "Totalbilirubin", "Urea", "WBC", "EjectionFraction", 
"TotalLungInvolvementRank", "TotalLungInvolvementPercent", "sex2", 
"Type.of.heart.disease1", "Type.of.heart.disease2", "Type.of.heart.disease9", 
"Unilateral.paralysis1", "Ulcers1", "Obesity.BMI.above.351", 
"Peripheral.artery.disease1", "organ.involment.from.diabetes1", 
"organ.involment.from.diabetes2", "organ.involment.from.diabetes3", 
"UsingDrugHistory1", "UsingAlcoholHistory1", "Transplantation1", 
"SeverityofKidneyDisease1", "SeverityofKidneyDisease2", "SeverityofKidneyDisease3", 
"SeverityChronicliverdisease1", "SeverityChronicliverdisease2", 
"SeverityChronicliverdisease3", "SeverityChronicliverdisease4", 
"SeverityChronicliverdisease9", "Schizophrenia1", "Rheumatologicaldiseases1", 
"Pregnant1", "Neurologicaldiseases1", "LiverTransplantation1", 
"KidneyTransplantation1", "Immunedeficiencydisease1", "Hypothyroidism1", 
"Hypertention1", "Hyperlipidemia1", "Historyofsmoking1", "HistoryofHookah1", 
"HeartTransplantation1", "HIV1", "FattyLiver1", "Diabetes1", 
"Chronicliverdisease1", "Chronickidneydisease1", "CardiovascularDisease1", 
"Cancers1", "CVAStrokeCVDTIA1", "COPD1", "Asthma1", "WetCough1", 
"WeightLoss1", "WeaknessandLethargy1", "Vomit1", "Trembling1", 
"Sweating1", "Sputum1", "Sorethroat1", "SkinRush1", "Rush1", 
"Rhinorrhea1", "PharynxExoda1", "Nausea1", "Muscle_Painmyalgia1", 
"Lossofsenseoftaste1", "Lossofsenseofsmell1", "LossofConsciousness1", 
"LimbEdema1", "Jointpain_Arthralgia1", "Hemoptysis1", "Headace1", 
"Fever1", "Fatigue1", "EyeConjunctivitis1", "Epigastric1", "Dyspnea1", 
"DryCough1", "Dizziness1", "Diarrhea1", "Chestpain1", "CardiacArrhythmia1", 
"Body_Pain1", "Bleeding1", "Ataxia1", "Anorexia1", "PCRCOVID19Test1", 
"PCRCOVID19Test2")), model = "rf", calledFrom = "varImp"), class = "varImp.train"), 
    structure(list(importance = structure(list(Overall = c(100, 
    36.8463357663146, 0, 20.5921448468941, 35.0980630859042, 
    15.7098956910968, 27.5542325637653, 22.3935810225052, 25.6062709809081, 
    18.9072078537409, 30.5428709528983, 26.4061314161858, 27.2933977255992, 
    18.3744993875278, 57.5115149169245, 14.4361277134982, 49.9265957132235, 
    6.10831602661626, 28.2527379885906, 23.0147565449908, 32.7997892888894, 
    22.7055707536584, 36.9763807158356, 28.9941599048441, 17.8186386653819, 
    31.2682240107287, 26.2894098494535, 41.1751827476675, 22.6316241605114, 
    16.9314172346857, 14.4927913128733, 13.1792980470757, 44.2836496383372, 
    32.7246002717468, 30.3912750391576, 10.0409713536124, 9.83444013035946, 
    2.50470824612248, 1.72055335723373, 1.05083165735798, 1.56193393834476, 
    0.233521622728958, 1.08064736921506, 0.555709266569136, 2.40106539585553, 
    0.291833555475466, 0.380999891346632, 2.56592221397732, 1.62107348934456, 
    0.504647559430998, 1.19859835755469, 0, 1.4382135880929, 
    1.94514657535966, 0, 0.0569205442253742, 0.44589056596685, 
    0.0539230755197555, 0, 0.055077983652405, 1.24527213390211, 
    0, 1.36267778294481, 0.151259347248717, 0.499919817645286, 
    0, 2.79981213016671, 2.72663427247346, 1.93725253183476, 
    2.70715099933653, 1.99722906280419, 0, 0.111342938271961, 
    1.2426657762317, 2.15186257620788, 0.584084013981451, 9.87542370836023, 
    3.21493418783175, 14.6556614893423, 0.67462103889104, 0.787088521176588, 
    2.61946726039402, 2.8099384934716, 0.377053883833586, 2.2824838493133, 
    1.12217532020233, 3.44210364347885, 2.61343827037804, 9.58864870521531, 
    1.77823199575717, 0, 0, 0.828679129518211, 0, 2.73842874693014, 
    14.5506870851474, 0.390367251047195, 0.811902694072225, 15.5803912323052, 
    4.18258978600944, 2.13546475796113, 2.66088800284236, 2.97761832225233, 
    3.54039994200135, 2.44519084017892, 0.737528372419208, 2.20708600548186, 
    4.12502178170407, 3.1835668678093, 7.61195991815971, 2.35303302862437, 
    5.70342032074721, 0.409606955773683, 2.4977310780031, 0.0107020031498121, 
    0.268000372472171, 2.32396173268619, 1.64515893404575, 0.868523484401606
    )), class = "data.frame", row.names = c("age", "Weight", 
    "HookhConsumption", "BMI", "SystolicBP", "RR", "DiastolicBP", 
    "ALP", "ALT", "AST", "Albumin", "BS", "CPK", "CRP", "Calcium", 
    "Creatinine", "Ddimer", "Directbilirubin", "ESR", "FBS", 
    "Ferritin", "HB", "LDH", "Lymphocyte", "Mg", "Neutrophyl", 
    "PLT", "PR", "PhosphorP", "PotassiumK", "SodiumNA", "Totalbilirubin", 
    "Urea", "WBC", "EjectionFraction", "TotalLungInvolvementRank", 
    "TotalLungInvolvementPercent", "sex2", "Type.of.heart.disease1", 
    "Type.of.heart.disease2", "Type.of.heart.disease9", "Unilateral.paralysis1", 
    "Ulcers1", "Obesity.BMI.above.351", "Peripheral.artery.disease1", 
    "organ.involment.from.diabetes1", "organ.involment.from.diabetes2", 
    "organ.involment.from.diabetes3", "UsingDrugHistory1", "UsingAlcoholHistory1", 
    "Transplantation1", "SeverityofKidneyDisease1", "SeverityofKidneyDisease2", 
    "SeverityofKidneyDisease3", "SeverityChronicliverdisease1", 
    "SeverityChronicliverdisease2", "SeverityChronicliverdisease3", 
    "SeverityChronicliverdisease4", "SeverityChronicliverdisease9", 
    "Schizophrenia1", "Rheumatologicaldiseases1", "Pregnant1", 
    "Neurologicaldiseases1", "LiverTransplantation1", "KidneyTransplantation1", 
    "Immunedeficiencydisease1", "Hypothyroidism1", "Hypertention1", 
    "Hyperlipidemia1", "Historyofsmoking1", "HistoryofHookah1", 
    "HeartTransplantation1", "HIV1", "FattyLiver1", "Diabetes1", 
    "Chronicliverdisease1", "Chronickidneydisease1", "CardiovascularDisease1", 
    "Cancers1", "CVAStrokeCVDTIA1", "COPD1", "Asthma1", "WetCough1", 
    "WeightLoss1", "WeaknessandLethargy1", "Vomit1", "Trembling1", 
    "Sweating1", "Sputum1", "Sorethroat1", "SkinRush1", "Rush1", 
    "Rhinorrhea1", "PharynxExoda1", "Nausea1", "Muscle_Painmyalgia1", 
    "Lossofsenseoftaste1", "Lossofsenseofsmell1", "LossofConsciousness1", 
    "LimbEdema1", "Jointpain_Arthralgia1", "Hemoptysis1", "Headace1", 
    "Fever1", "Fatigue1", "EyeConjunctivitis1", "Epigastric1", 
    "Dyspnea1", "DryCough1", "Dizziness1", "Diarrhea1", "Chestpain1", 
    "CardiacArrhythmia1", "Body_Pain1", "Bleeding1", "Ataxia1", 
    "Anorexia1", "PCRCOVID19Test1", "PCRCOVID19Test2")), model = "rf", 
        calledFrom = "varImp"), class = "varImp.train"), structure(list(
        importance = structure(list(Overall = c(100, 36.4519408382731, 
        0.0121282468302786, 27.9982404793903, 19.4487163883379, 
        24.6079653972917, 14.1539998143239, 18.684018340339, 
        20.1182663550791, 17.4200861293186, 46.6309831468223, 
        52.2217679510578, 28.5910698857479, 16.845796014194, 
        31.6509235655573, 17.1000574614637, 27.8424176478161, 
        5.69845064904499, 21.3838903337718, 20.217605303817, 
        19.8702958841878, 22.3737582989512, 33.0788664305301, 
        20.6035947546629, 16.3220426343042, 23.4809287675538, 
        23.1749036748423, 57.122094059206, 12.2409421568247, 
        11.234114301956, 15.7946508155502, 8.80563729211453, 
        20.2205078755919, 20.3091908316546, 27.7497357152039, 
        3.8622908315769, 12.8894291926347, 5.96701805516155, 
        0.761922263853243, 1.41991036581607, 1.54560737492769, 
        0.825161722105208, 0.0172016746252156, 0.693982409239905, 
        0, 0.358366468201754, 1.74812586771487, 2.2746344067366, 
        0.745595100629448, 0.465199425668223, 0.408092232849501, 
        0.115358703965213, 0.0358338604150282, 2.88640197248697, 
        0, 0.288302498762889, 0.332551323637155, 0.0121282468302786, 
        0, 1.03515126482736, 1.1213600137207, 0.329413397366096, 
        2.0612368962315, 0, 0.610994615626186, 1.0215655608971, 
        3.90651448858199, 1.73374217783332, 1.47244358073369, 
        2.20534241559288, 0.173681720638885, 0, 0.631950099628902, 
        0.132328128708788, 2.92435478031454, 1.03537122788376, 
        4.74067414123091, 1.77981701502525, 13.1150432121738, 
        0.720556880972878, 1.20366662244445, 1.19169376389038, 
        1.86442992849398, 0.518200723424615, 2.278501378269, 
        1.23638371282217, 3.66947066761794, 2.03933409738165, 
        1.25289331603719, 1.01627904400807, 0.0324453169731015, 
        0, 2.29817177168672, 0, 1.53194610140319, 7.15322639329996, 
        0.759542631415349, 1.53353473284619, 4.77390474517756, 
        1.05656481042379, 0.699450154375729, 1.16224285818854, 
        3.65223350861514, 1.93274707207956, 1.57589588221639, 
        0.449432695377871, 1.36863730886437, 2.11275137384133, 
        3.29450357362525, 1.08676677214028, 2.18565092410049, 
        1.15456248328987, 0.492245547306216, 1.59592156033113, 
        0.0129367966189638, 0.514499765305734, 1.58591810753971, 
        1.84832826238423, 0.807564130566264)), class = "data.frame", row.names = c("age", 
        "Weight", "HookhConsumption", "BMI", "SystolicBP", "RR", 
        "DiastolicBP", "ALP", "ALT", "AST", "Albumin", "BS", 
        "CPK", "CRP", "Calcium", "Creatinine", "Ddimer", "Directbilirubin", 
        "ESR", "FBS", "Ferritin", "HB", "LDH", "Lymphocyte", 
        "Mg", "Neutrophyl", "PLT", "PR", "PhosphorP", "PotassiumK", 
        "SodiumNA", "Totalbilirubin", "Urea", "WBC", "EjectionFraction", 
        "TotalLungInvolvementRank", "TotalLungInvolvementPercent", 
        "sex2", "Type.of.heart.disease1", "Type.of.heart.disease2", 
        "Type.of.heart.disease9", "Unilateral.paralysis1", "Ulcers1", 
        "Obesity.BMI.above.351", "Peripheral.artery.disease1", 
        "organ.involment.from.diabetes1", "organ.involment.from.diabetes2", 
        "organ.involment.from.diabetes3", "UsingDrugHistory1", 
        "UsingAlcoholHistory1", "Transplantation1", "SeverityofKidneyDisease1", 
        "SeverityofKidneyDisease2", "SeverityofKidneyDisease3", 
        "SeverityChronicliverdisease1", "SeverityChronicliverdisease2", 
        "SeverityChronicliverdisease3", "SeverityChronicliverdisease4", 
        "SeverityChronicliverdisease9", "Schizophrenia1", "Rheumatologicaldiseases1", 
        "Pregnant1", "Neurologicaldiseases1", "LiverTransplantation1", 
        "KidneyTransplantation1", "Immunedeficiencydisease1", 
        "Hypothyroidism1", "Hypertention1", "Hyperlipidemia1", 
        "Historyofsmoking1", "HistoryofHookah1", "HeartTransplantation1", 
        "HIV1", "FattyLiver1", "Diabetes1", "Chronicliverdisease1", 
        "Chronickidneydisease1", "CardiovascularDisease1", "Cancers1", 
        "CVAStrokeCVDTIA1", "COPD1", "Asthma1", "WetCough1", 
        "WeightLoss1", "WeaknessandLethargy1", "Vomit1", "Trembling1", 
        "Sweating1", "Sputum1", "Sorethroat1", "SkinRush1", "Rush1", 
        "Rhinorrhea1", "PharynxExoda1", "Nausea1", "Muscle_Painmyalgia1", 
        "Lossofsenseoftaste1", "Lossofsenseofsmell1", "LossofConsciousness1", 
        "LimbEdema1", "Jointpain_Arthralgia1", "Hemoptysis1", 
        "Headace1", "Fever1", "Fatigue1", "EyeConjunctivitis1", 
        "Epigastric1", "Dyspnea1", "DryCough1", "Dizziness1", 
        "Diarrhea1", "Chestpain1", "CardiacArrhythmia1", "Body_Pain1", 
        "Bleeding1", "Ataxia1", "Anorexia1", "PCRCOVID19Test1", 
        "PCRCOVID19Test2")), model = "rf", calledFrom = "varImp"), class = "varImp.train"))

I'm training the random forest algorithm three times and saving the variables' importance into the list ( using the caret package). how can I calculate the mean of each feature if it exists?
for example, how can I calculate the mean of three overall "ESR"? ( I am going to train this algorithm a thousand times )
these are my example :

[[1]]
rf variable importance


  only 20 most important variables shown (out of 119)

                 Overall
Albumin           100.00
age                97.36
PR                 60.18
RR                 42.41
Weight             35.26
SystolicBP         32.14
Cancers1           29.79
ESR                27.66
Neutrophyl         26.98
CPK                25.68
EjectionFraction   25.59
BMI                24.42
Calcium            23.87
WBC                22.36
Urea               22.01
LDH                21.23
FBS                20.21
Ddimer             19.32
HB                 18.99
Lymphocyte         18.78

[[2]]
rf variable importance

  only 20 most important variables shown (out of 119)

                 Overall
age               100.00
FBS                57.80
WBC                53.88
PR                 53.84
Neutrophyl         53.52
Weight             52.31
HB                 51.69
LDH                50.15
Urea               49.31
Albumin            47.05
Lymphocyte         46.87
CPK                46.54
SystolicBP         45.64
Calcium            44.87
ESR                43.54
Ferritin           43.03
CRP                43.00
PLT                42.83
Creatinine         42.53
EjectionFraction   41.43
[[3]]
rf variable importance

  only 20 most important variables shown (out of 119)

                 Overall
age               100.00
Albumin            43.41
Weight             24.88
FBS                24.63
BS                 23.31
PR                 21.47
LDH                21.06
Neutrophyl         20.68
BMI                17.94
EjectionFraction   17.29
CPK                16.49
WBC                16.11
ALP                15.72
RR                 15.28
Lymphocyte         14.94
Cancers1           14.68
CRP                14.50
ESR                14.38
Ddimer             13.05
Ferritin           12.96

can I create a data frame that saves the features and their overall?
thanks for helping
this is my code :

prediction_value_rf=list()
importance_rf=list()
auc_rf=list()
weight_rf=list()
for ( i in 1:1000){
   resample_death <- death[sample(nrow(death), size=300), ]
   resample_alive <-alive[sample(nrow(alive), size=300), ]
   f_dataset=rbind(resample_alive,resample_death)
   inx <- sample.split(seq_len(nrow(f_dataset)), 0.25)
   trainData<- f_dataset[!inx, ]
   testData <-  f_dataset[inx, ]
   rf_fit <- train(vital_status ~ ., 
                   data = trainData, 
                   method = "rf",
   )
   pred=predict(rf_fit, testData[,-109])
   pred1=predict(rf_fit, testData[,-109],type='prob')
   prediction_value_rf[[i]]=pred1[2]
   auc=auc(testData$vital_status,as.numeric(pred1[[2]]),direction="<", levels = levels(testData$vital_status))
   auc_rf[[i]]=auc
   a=varImp(rf_fit,scale = TRUE)
   importance_rf[[i]] <- a
   weight_rf[[i]]=max(rf_fit$results$Accuracy)
}

in the end, I want to calculate the mean of all overall features (wanna create ensemble model ) .
my dataset contain 109 feature and 4200 sample .

> dput(importance_rf)
list(structure(list(importance = structure(list(Overall = c(100, 
32.9191368970689, 0, 29.4889011862606, 24.8664587940577, 21.8746288172869, 
21.7051171149606, 20.0868919191658, 20.3678665772965, 20.2873319598582, 
33.7597621482843, 42.1891066454062, 22.7027798691687, 17.0766042463516, 
39.4559095867264, 17.9431725056776, 23.2881573588367, 5.04721532342669, 
22.3290849893345, 20.7266835722104, 21.5723519894789, 19.5211504808207, 
21.2794742178794, 20.1624361665348, 13.7420140365184, 31.7941409073075, 
20.9409991203303, 30.4229311296897, 11.5187371425859, 12.8487688047673, 
9.40749461290917, 10.361793419014, 32.5677389075859, 26.5411449178312, 
23.3996095888034, 2.84823906954271, 10.0257295515002, 2.27406632480383, 
0.221285401034356, 0.844517489791465, 1.97286969198767, 0.0909347758420391, 
0.541007254389242, 0.359718315763083, 1.26912866459011, 0.158954429130366, 
0.245159217854806, 1.43768928047267, 0.796627703857018, 0.0731764363395144, 
1.72357935713514, 0.424562470997031, 3.38312715168264, 1.88770244332681, 
0.0314985706869475, 0, 0.65427952713802, 0, 0.0171557103229226, 
0.709743254593806, 1.13539938842206, 0.0367104133426984, 2.95211595985093, 
0, 0.582868854914444, 0.393813676879418, 1.15732422255054, 2.24940561099934, 
1.73472209382337, 1.34428847541862, 1.15486784386305, 0, 0.689216959226089, 
0.625678629482648, 1.81161997423301, 0.433030827900777, 10.9106578268112, 
2.24295278032112, 18.176936900799, 1.74711580562318, 1.45310012173878, 
0.952143653091356, 1.16652405720194, 1.11866015943186, 2.68527336222893, 
1.12853921993574, 5.10727247259446, 1.93994049536545, 1.36475795626174, 
2.95717137358439, 0.115367165512589, 0, 1.45815337045876, 0, 
1.78943634306828, 5.71749991297189, 2.43536004133198, 1.27231795918686, 
11.4771984230702, 3.0971032186365, 0.708058471655881, 0.170261025718881, 
3.37435307537382, 1.56044494248123, 1.09294450754124, 0, 2.25592933845801, 
2.30276525800757, 1.86149986210819, 1.46145976307003, 1.26858067553346, 
2.11041986636824, 0.0902116364175813, 1.54299863875175, 0, 0.269632340125967, 
1.88548693593634, 4.47233507072462, 0.66752451890319)), class = "data.frame", row.names = c("age", 
"Weight", "HookhConsumption", "BMI", "SystolicBP", "RR", "DiastolicBP", 
"ALP", "ALT", "AST", "Albumin", "BS", "CPK", "CRP", "Calcium", 
"Creatinine", "Ddimer", "Directbilirubin", "ESR", "FBS", "Ferritin", 
"HB", "LDH", "Lymphocyte", "Mg", "Neutrophyl", "PLT", "PR", "PhosphorP", 
"PotassiumK", "SodiumNA", "Totalbilirubin", "Urea", "WBC", "EjectionFraction", 
"TotalLungInvolvementRank", "TotalLungInvolvementPercent", "sex2", 
"Type.of.heart.disease1", "Type.of.heart.disease2", "Type.of.heart.disease9", 
"Unilateral.paralysis1", "Ulcers1", "Obesity.BMI.above.351", 
"Peripheral.artery.disease1", "organ.involment.from.diabetes1", 
"organ.involment.from.diabetes2", "organ.involment.from.diabetes3", 
"UsingDrugHistory1", "UsingAlcoholHistory1", "Transplantation1", 
"SeverityofKidneyDisease1", "SeverityofKidneyDisease2", "SeverityofKidneyDisease3", 
"SeverityChronicliverdisease1", "SeverityChronicliverdisease2", 
"SeverityChronicliverdisease3", "SeverityChronicliverdisease4", 
"SeverityChronicliverdisease9", "Schizophrenia1", "Rheumatologicaldiseases1", 
"Pregnant1", "Neurologicaldiseases1", "LiverTransplantation1", 
"KidneyTransplantation1", "Immunedeficiencydisease1", "Hypothyroidism1", 
"Hypertention1", "Hyperlipidemia1", "Historyofsmoking1", "HistoryofHookah1", 
"HeartTransplantation1", "HIV1", "FattyLiver1", "Diabetes1", 
"Chronicliverdisease1", "Chronickidneydisease1", "CardiovascularDisease1", 
"Cancers1", "CVAStrokeCVDTIA1", "COPD1", "Asthma1", "WetCough1", 
"WeightLoss1", "WeaknessandLethargy1", "Vomit1", "Trembling1", 
"Sweating1", "Sputum1", "Sorethroat1", "SkinRush1", "Rush1", 
"Rhinorrhea1", "PharynxExoda1", "Nausea1", "Muscle_Painmyalgia1", 
"Lossofsenseoftaste1", "Lossofsenseofsmell1", "LossofConsciousness1", 
"LimbEdema1", "Jointpain_Arthralgia1", "Hemoptysis1", "Headace1", 
"Fever1", "Fatigue1", "EyeConjunctivitis1", "Epigastric1", "Dyspnea1", 
"DryCough1", "Dizziness1", "Diarrhea1", "Chestpain1", "CardiacArrhythmia1", 
"Body_Pain1", "Bleeding1", "Ataxia1", "Anorexia1", "PCRCOVID19Test1", 
"PCRCOVID19Test2")), model = "rf", calledFrom = "varImp"), class = "varImp.train"), 
    structure(list(importance = structure(list(Overall = c(100, 
    36.8463357663146, 0, 20.5921448468941, 35.0980630859042, 
    15.7098956910968, 27.5542325637653, 22.3935810225052, 25.6062709809081, 
    18.9072078537409, 30.5428709528983, 26.4061314161858, 27.2933977255992, 
    18.3744993875278, 57.5115149169245, 14.4361277134982, 49.9265957132235, 
    6.10831602661626, 28.2527379885906, 23.0147565449908, 32.7997892888894, 
    22.7055707536584, 36.9763807158356, 28.9941599048441, 17.8186386653819, 
    31.2682240107287, 26.2894098494535, 41.1751827476675, 22.6316241605114, 
    16.9314172346857, 14.4927913128733, 13.1792980470757, 44.2836496383372, 
    32.7246002717468, 30.3912750391576, 10.0409713536124, 9.83444013035946, 
    2.50470824612248, 1.72055335723373, 1.05083165735798, 1.56193393834476, 
    0.233521622728958, 1.08064736921506, 0.555709266569136, 2.40106539585553, 
    0.291833555475466, 0.380999891346632, 2.56592221397732, 1.62107348934456, 
    0.504647559430998, 1.19859835755469, 0, 1.4382135880929, 
    1.94514657535966, 0, 0.0569205442253742, 0.44589056596685, 
    0.0539230755197555, 0, 0.055077983652405, 1.24527213390211, 
    0, 1.36267778294481, 0.151259347248717, 0.499919817645286, 
    0, 2.79981213016671, 2.72663427247346, 1.93725253183476, 
    2.70715099933653, 1.99722906280419, 0, 0.111342938271961, 
    1.2426657762317, 2.15186257620788, 0.584084013981451, 9.87542370836023, 
    3.21493418783175, 14.6556614893423, 0.67462103889104, 0.787088521176588, 
    2.61946726039402, 2.8099384934716, 0.377053883833586, 2.2824838493133, 
    1.12217532020233, 3.44210364347885, 2.61343827037804, 9.58864870521531, 
    1.77823199575717, 0, 0, 0.828679129518211, 0, 2.73842874693014, 
    14.5506870851474, 0.390367251047195, 0.811902694072225, 15.5803912323052, 
    4.18258978600944, 2.13546475796113, 2.66088800284236, 2.97761832225233, 
    3.54039994200135, 2.44519084017892, 0.737528372419208, 2.20708600548186, 
    4.12502178170407, 3.1835668678093, 7.61195991815971, 2.35303302862437, 
    5.70342032074721, 0.409606955773683, 2.4977310780031, 0.0107020031498121, 
    0.268000372472171, 2.32396173268619, 1.64515893404575, 0.868523484401606
    )), class = "data.frame", row.names = c("age", "Weight", 
    "HookhConsumption", "BMI", "SystolicBP", "RR", "DiastolicBP", 
    "ALP", "ALT", "AST", "Albumin", "BS", "CPK", "CRP", "Calcium", 
    "Creatinine", "Ddimer", "Directbilirubin", "ESR", "FBS", 
    "Ferritin", "HB", "LDH", "Lymphocyte", "Mg", "Neutrophyl", 
    "PLT", "PR", "PhosphorP", "PotassiumK", "SodiumNA", "Totalbilirubin", 
    "Urea", "WBC", "EjectionFraction", "TotalLungInvolvementRank", 
    "TotalLungInvolvementPercent", "sex2", "Type.of.heart.disease1", 
    "Type.of.heart.disease2", "Type.of.heart.disease9", "Unilateral.paralysis1", 
    "Ulcers1", "Obesity.BMI.above.351", "Peripheral.artery.disease1", 
    "organ.involment.from.diabetes1", "organ.involment.from.diabetes2", 
    "organ.involment.from.diabetes3", "UsingDrugHistory1", "UsingAlcoholHistory1", 
    "Transplantation1", "SeverityofKidneyDisease1", "SeverityofKidneyDisease2", 
    "SeverityofKidneyDisease3", "SeverityChronicliverdisease1", 
    "SeverityChronicliverdisease2", "SeverityChronicliverdisease3", 
    "SeverityChronicliverdisease4", "SeverityChronicliverdisease9", 
    "Schizophrenia1", "Rheumatologicaldiseases1", "Pregnant1", 
    "Neurologicaldiseases1", "LiverTransplantation1", "KidneyTransplantation1", 
    "Immunedeficiencydisease1", "Hypothyroidism1", "Hypertention1", 
    "Hyperlipidemia1", "Historyofsmoking1", "HistoryofHookah1", 
    "HeartTransplantation1", "HIV1", "FattyLiver1", "Diabetes1", 
    "Chronicliverdisease1", "Chronickidneydisease1", "CardiovascularDisease1", 
    "Cancers1", "CVAStrokeCVDTIA1", "COPD1", "Asthma1", "WetCough1", 
    "WeightLoss1", "WeaknessandLethargy1", "Vomit1", "Trembling1", 
    "Sweating1", "Sputum1", "Sorethroat1", "SkinRush1", "Rush1", 
    "Rhinorrhea1", "PharynxExoda1", "Nausea1", "Muscle_Painmyalgia1", 
    "Lossofsenseoftaste1", "Lossofsenseofsmell1", "LossofConsciousness1", 
    "LimbEdema1", "Jointpain_Arthralgia1", "Hemoptysis1", "Headace1", 
    "Fever1", "Fatigue1", "EyeConjunctivitis1", "Epigastric1", 
    "Dyspnea1", "DryCough1", "Dizziness1", "Diarrhea1", "Chestpain1", 
    "CardiacArrhythmia1", "Body_Pain1", "Bleeding1", "Ataxia1", 
    "Anorexia1", "PCRCOVID19Test1", "PCRCOVID19Test2")), model = "rf", 
        calledFrom = "varImp"), class = "varImp.train"), structure(list(
        importance = structure(list(Overall = c(100, 36.4519408382731, 
        0.0121282468302786, 27.9982404793903, 19.4487163883379, 
        24.6079653972917, 14.1539998143239, 18.684018340339, 
        20.1182663550791, 17.4200861293186, 46.6309831468223, 
        52.2217679510578, 28.5910698857479, 16.845796014194, 
        31.6509235655573, 17.1000574614637, 27.8424176478161, 
        5.69845064904499, 21.3838903337718, 20.217605303817, 
        19.8702958841878, 22.3737582989512, 33.0788664305301, 
        20.6035947546629, 16.3220426343042, 23.4809287675538, 
        23.1749036748423, 57.122094059206, 12.2409421568247, 
        11.234114301956, 15.7946508155502, 8.80563729211453, 
        20.2205078755919, 20.3091908316546, 27.7497357152039, 
        3.8622908315769, 12.8894291926347, 5.96701805516155, 
        0.761922263853243, 1.41991036581607, 1.54560737492769, 
        0.825161722105208, 0.0172016746252156, 0.693982409239905, 
        0, 0.358366468201754, 1.74812586771487, 2.2746344067366, 
        0.745595100629448, 0.465199425668223, 0.408092232849501, 
        0.115358703965213, 0.0358338604150282, 2.88640197248697, 
        0, 0.288302498762889, 0.332551323637155, 0.0121282468302786, 
        0, 1.03515126482736, 1.1213600137207, 0.329413397366096, 
        2.0612368962315, 0, 0.610994615626186, 1.0215655608971, 
        3.90651448858199, 1.73374217783332, 1.47244358073369, 
        2.20534241559288, 0.173681720638885, 0, 0.631950099628902, 
        0.132328128708788, 2.92435478031454, 1.03537122788376, 
        4.74067414123091, 1.77981701502525, 13.1150432121738, 
        0.720556880972878, 1.20366662244445, 1.19169376389038, 
        1.86442992849398, 0.518200723424615, 2.278501378269, 
        1.23638371282217, 3.66947066761794, 2.03933409738165, 
        1.25289331603719, 1.01627904400807, 0.0324453169731015, 
        0, 2.29817177168672, 0, 1.53194610140319, 7.15322639329996, 
        0.759542631415349, 1.53353473284619, 4.77390474517756, 
        1.05656481042379, 0.699450154375729, 1.16224285818854, 
        3.65223350861514, 1.93274707207956, 1.57589588221639, 
        0.449432695377871, 1.36863730886437, 2.11275137384133, 
        3.29450357362525, 1.08676677214028, 2.18565092410049, 
        1.15456248328987, 0.492245547306216, 1.59592156033113, 
        0.0129367966189638, 0.514499765305734, 1.58591810753971, 
        1.84832826238423, 0.807564130566264)), class = "data.frame", row.names = c("age", 
        "Weight", "HookhConsumption", "BMI", "SystolicBP", "RR", 
        "DiastolicBP", "ALP", "ALT", "AST", "Albumin", "BS", 
        "CPK", "CRP", "Calcium", "Creatinine", "Ddimer", "Directbilirubin", 
        "ESR", "FBS", "Ferritin", "HB", "LDH", "Lymphocyte", 
        "Mg", "Neutrophyl", "PLT", "PR", "PhosphorP", "PotassiumK", 
        "SodiumNA", "Totalbilirubin", "Urea", "WBC", "EjectionFraction", 
        "TotalLungInvolvementRank", "TotalLungInvolvementPercent", 
        "sex2", "Type.of.heart.disease1", "Type.of.heart.disease2", 
        "Type.of.heart.disease9", "Unilateral.paralysis1", "Ulcers1", 
        "Obesity.BMI.above.351", "Peripheral.artery.disease1", 
        "organ.involment.from.diabetes1", "organ.involment.from.diabetes2", 
        "organ.involment.from.diabetes3", "UsingDrugHistory1", 
        "UsingAlcoholHistory1", "Transplantation1", "SeverityofKidneyDisease1", 
        "SeverityofKidneyDisease2", "SeverityofKidneyDisease3", 
        "SeverityChronicliverdisease1", "SeverityChronicliverdisease2", 
        "SeverityChronicliverdisease3", "SeverityChronicliverdisease4", 
        "SeverityChronicliverdisease9", "Schizophrenia1", "Rheumatologicaldiseases1", 
        "Pregnant1", "Neurologicaldiseases1", "LiverTransplantation1", 
        "KidneyTransplantation1", "Immunedeficiencydisease1", 
        "Hypothyroidism1", "Hypertention1", "Hyperlipidemia1", 
        "Historyofsmoking1", "HistoryofHookah1", "HeartTransplantation1", 
        "HIV1", "FattyLiver1", "Diabetes1", "Chronicliverdisease1", 
        "Chronickidneydisease1", "CardiovascularDisease1", "Cancers1", 
        "CVAStrokeCVDTIA1", "COPD1", "Asthma1", "WetCough1", 
        "WeightLoss1", "WeaknessandLethargy1", "Vomit1", "Trembling1", 
        "Sweating1", "Sputum1", "Sorethroat1", "SkinRush1", "Rush1", 
        "Rhinorrhea1", "PharynxExoda1", "Nausea1", "Muscle_Painmyalgia1", 
        "Lossofsenseoftaste1", "Lossofsenseofsmell1", "LossofConsciousness1", 
        "LimbEdema1", "Jointpain_Arthralgia1", "Hemoptysis1", 
        "Headace1", "Fever1", "Fatigue1", "EyeConjunctivitis1", 
        "Epigastric1", "Dyspnea1", "DryCough1", "Dizziness1", 
        "Diarrhea1", "Chestpain1", "CardiacArrhythmia1", "Body_Pain1", 
        "Bleeding1", "Ataxia1", "Anorexia1", "PCRCOVID19Test1", 
        "PCRCOVID19Test2")), model = "rf", calledFrom = "varImp"), class = "varImp.train"))

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

烟火散人牵绊 2025-02-03 01:24:04

对于这部分:

如果存在,如何计算每个功能的平均值?例如,如何计算三个总“ ESR”的平均值?

因为您已经生成了列表,所以您可以创建一个函数,以选择包含功能名称的行,然后将此函数应用于列表的每个元素,然后将其弄平,然后计算均值。如果在某些元素中不存在该功能,则可以使用na.rm将其排除在平均计算之外。

例如,此类似于您的列表:

mylist <- list(structure(list(Overall = c(100, 97.36, 60.18, 42.41, 35.26, 
32.14, 29.79, 27.66, 26.98, 25.68, 25.59, 24.42, 23.87, 22.36, 
22.01, 21.23, 20.21, 19.32, 18.99, 18.78)), class = "data.frame", row.names = c("Albumin", 
"age", "PR", "RR", "Weight", "SystolicBP", "Cancers1", "ESR", 
"Neutrophyl", "CPK", "EjectionFraction", "BMI", "Calcium", "WBC", 
"Urea", "LDH", "FBS", "Ddimer", "HB", "Lymphocyte")), structure(list(
    Overall = c(100, 57.8, 53.88, 53.84, 53.52, 52.31, 51.69, 
    50.15, 49.31, 47.05, 46.87, 46.54, 45.64, 44.87, 43.54, 43.03, 
    43, 42.83, 42.53, 41.43)), class = "data.frame", row.names = c("age", 
"FBS", "WBC", "PR", "Neutrophyl", "Weight", "HB", "LDH", "Urea", 
"Albumin", "Lymphocyte", "CPK", "SystolicBP", "Calcium", "ESR", 
"Ferritin", "CRP", "PLT", "Creatinine", "EjectionFraction")), 
    structure(list(Overall = c(100, 43.41, 24.88, 24.63, 23.31, 
    21.47, 21.06, 20.68, 17.94, 17.29, 16.49, 16.11, 15.72, 15.28, 
    14.94, 14.68, 14.5, 14.38, 13.05, 12.96)), class = "data.frame", row.names = c("age", 
    "Albumin", "Weight", "FBS", "BS", "PR", "LDH", "Neutrophyl", 
    "BMI", "EjectionFraction", "CPK", "WBC", "ALP", "RR", "Lymphocyte", 
    "Cancers1", "CRP", "ESR", "Ddimer", "Ferritin")))

以下是如何计算eSr的平均值,在所有元素中都存在,crp在其中一个元素中不存在:

mylist |> lapply(function(dat) dat["ESR", "Overall"]) |> unlist() |> mean(na.rm = TRUE)
#[1] 28.52667

mylist |> lapply(function(dat) dat["CRP", "Overall"]) |> unlist() |> mean(na.rm = TRUE)
#[1] 28.75

因为您有许多功能,可以创建另一个功能将此步骤应用于每个功能。例如:

features <- c("ESR", "CRP", "CPK", "WBC", "LDH")
feature_mean <- function(feature_name){
    out <- lapply(mylist, function(dat) dat[feature_name, "Overall"])|> 
        unlist() |> mean(na.rm = TRUE) |> 
        setNames(paste0("mean_",feature_name))
    return(out)
     }

features |> lapply(feature_mean) |> unlist()

#mean_ESR mean_CRP mean_CPK mean_WBC mean_LDH 
#28.52667 28.75000 29.57000 30.78333 30.81333 

编辑

上一个示例中使用的合成数据myList,在其每个元素中仅包含一个“总体”数据框架对象,因此提取的提取功能可以使用lapply直接应用于数据。但是,您在更新的问题中提供的实际数据,eximples_rf在其每个元素中都有多个对象,而“总体”数据框架对象是第一个元素。区别在于您在评论中显示的错误的原因。要应用提取,应首先使用lapply(function(list)list [[1]])首先提取“总体”数据帧,然后可以应用上一个步骤。

# Extract mean ESR 
importance_rf |> 
 lapply(function(list) list[[1]]) |> 
 lapply(function(dat) dat["ESR", "Overall"]) |> 
 unlist() |> 
 mean(na.rm = TRUE)
#[1] 23.98857

# Extract mean CRP
importance_rf |> 
 lapply(function(list) list[[1]]) |> 
 lapply(function(dat) dat["CRP", "Overall"]) |> 
 unlist() |> 
 mean(na.rm = TRUE)
#[1] 17.4323

a {base r}方式

以下步骤可以应用于以下功能的向量:

features <- c("ESR", "CRP", "CPK", "WBC", "LDH")

feature_mean <- function(feature_name){
     out <- importance_rf |> 
         lapply(function(list) list[[1]]) |>
         lapply(function(dat) dat[feature_name, "Overall"])|> 
         unlist() |> mean(na.rm = TRUE) |> 
         setNames(paste0("mean_",feature_name))
     return(out)
}

# Extract the mean values

features |> lapply(feature_mean) |> unlist()

#mean_ESR mean_CRP mean_CPK mean_WBC mean_LDH 
#23.98857 17.43230 26.19575 26.52498 30.44491 

关于代码的简短说明:

  • lapply(function(list) list [[1]])exigants_rf列表中提取每个元素的第一个元素,该列表是包含功能数据的数据框架。
  • dat[feature_name, "Overall"] extract the value of a targeted feature, feature_name, in each extracted data frame. Only one feature is extracted from each data frame in every step.
  • UNLIST()将提取功能的数据结构从列表转换为数字向量。
  • setNames为数字向量创建名称,以易于识别计算手段的功能。

以这种方式使用的功能都属于base r类别。
您无需安装任何外部软件包即可获取它们。
另一个选项是将基本功能的组合与purrr软件包中的其他功能结合在一起。

a {purrr}方式

library(purrr)

importance_rf |> 
  map(pluck(1,1)) |> 
  map(function(dat) set_names(dat[features,], features)) |>
  as.data.frame() |> 
  rowMeans() |> 
  set_names(paste0("mean_", features))

#mean_ESR mean_CRP mean_CPK mean_WBC mean_LDH 
#23.98857 17.43230 26.19575 26.52498 30.44491

这些步骤比上面的基础r中的步骤要短得多,但是每个步骤中所做的事情可能不太明显。

请注意,地图lapplypluck(x,1,1)x [[1]] [[1]] [ [1]]

关于代码的简短说明:

  • map(pluck(1,1))提取数据帧,与lapply(function> lapply(function(list)list [[[[[[[[[[[[[[[[[[[[[]) 1]]))
  • map(function(dat)set_names(dat [功能,],功能)))提取功能列表,与dat [feature_name,“总体”,“总体”]相似。

有区别:

在上面的基本r方式中,从所有数据帧中提取每个功能,然后计算平均值,然后以相同的方式提取另一个功能。

以这种purrr的方式,从列表中的每个数据框架中提取所有目标功能,然后将功能组合在一起以通过使用as.data.frame成为新的数据框架,以便每行代表一个功能。然后,RowMeans用于计算特征的所有值的平均值。

请注意,您可以在| |&gt;管道之前检查每个步骤的结果。例如,eximentance_rf将显示每个元素中的所有对象。
exuntionals_rf |&gt;地图(Pluck(1,1))将仅显示数据框对象。

加权含量的更新

是一个简单的示例,说明如何计算列表中每个功能的加权平均值。假设您有此列表:

some.list <- list(L1 = c(a = 2, b = 4, c = 7), 
                  L2 = c(a = 5, b = 5, c = 2), 
                  L3 = c(a = 3, b = 3, c = 6))
some.list
$L1
a b c 
2 4 7 

$L2
a b c 
5 5 2 

$L3
a b c 
3 3 6 

并假设列表中的L1,L2和L3的权重值以下:

weight <- c(w.L1 = 0.5, w.L2=0.6, w.L3 = 0.9)
weight
w.L1 w.L2 w.L3 
 0.5  0.6  0.9 

为了计算A的加权均值,例如,您需要此计算:

此处可以通过将列表中的每个值乘以受尊敬的归一化重量来实现这一目标。在这种情况下,W1的归一化重量为W1/(W1+W2+W3)

要在R中执行以下步骤:

norm.weight <- weight/sum(weight)
norm.weight
w.L1 w.L2 w.L3 
0.25 0.30 0.45 

# weighted means of a,b, and c
some.list |> map2(norm.weight, `*`) |> as.data.frame() |> rowSums()
   a    b    c 
3.35 3.85 5.05 

将这些模拟权重值应用于您的eximpals_rf list和功能在示例中,我们获取:

importance_rf |> 
    map(pluck(1,1)) |> 
    map(function(dat) set_names(dat[features,], features)) |>
    map2(norm.weight, `*`) |> 
    as.data.frame() |> 
    rowSums()
    
     ESR      CRP      CPK      WBC      LDH 
23.68084 17.36211 26.72970 25.59180 31.29827 

For this part:

how can I calculate the mean of each feature if it exists? for example, how can I calculate the mean of three overall "ESR"?

Because you have already generated the list, you can create a function that selects the row that contains the feature name, and then apply this function to each element of the list, and then flatten it, and then calculate the mean. In case in some element the feature doesn't exist, it can be excluded from mean calculation by using na.rm.

For example, this resembles your list:

mylist <- list(structure(list(Overall = c(100, 97.36, 60.18, 42.41, 35.26, 
32.14, 29.79, 27.66, 26.98, 25.68, 25.59, 24.42, 23.87, 22.36, 
22.01, 21.23, 20.21, 19.32, 18.99, 18.78)), class = "data.frame", row.names = c("Albumin", 
"age", "PR", "RR", "Weight", "SystolicBP", "Cancers1", "ESR", 
"Neutrophyl", "CPK", "EjectionFraction", "BMI", "Calcium", "WBC", 
"Urea", "LDH", "FBS", "Ddimer", "HB", "Lymphocyte")), structure(list(
    Overall = c(100, 57.8, 53.88, 53.84, 53.52, 52.31, 51.69, 
    50.15, 49.31, 47.05, 46.87, 46.54, 45.64, 44.87, 43.54, 43.03, 
    43, 42.83, 42.53, 41.43)), class = "data.frame", row.names = c("age", 
"FBS", "WBC", "PR", "Neutrophyl", "Weight", "HB", "LDH", "Urea", 
"Albumin", "Lymphocyte", "CPK", "SystolicBP", "Calcium", "ESR", 
"Ferritin", "CRP", "PLT", "Creatinine", "EjectionFraction")), 
    structure(list(Overall = c(100, 43.41, 24.88, 24.63, 23.31, 
    21.47, 21.06, 20.68, 17.94, 17.29, 16.49, 16.11, 15.72, 15.28, 
    14.94, 14.68, 14.5, 14.38, 13.05, 12.96)), class = "data.frame", row.names = c("age", 
    "Albumin", "Weight", "FBS", "BS", "PR", "LDH", "Neutrophyl", 
    "BMI", "EjectionFraction", "CPK", "WBC", "ALP", "RR", "Lymphocyte", 
    "Cancers1", "CRP", "ESR", "Ddimer", "Ferritin")))

Here is how to calculate the mean of ESR, which exists in all elements and CRP which does not exist in one of the elements:

mylist |> lapply(function(dat) dat["ESR", "Overall"]) |> unlist() |> mean(na.rm = TRUE)
#[1] 28.52667

mylist |> lapply(function(dat) dat["CRP", "Overall"]) |> unlist() |> mean(na.rm = TRUE)
#[1] 28.75

Because you have many features, you can create another function to apply this step to each feature. For example:

features <- c("ESR", "CRP", "CPK", "WBC", "LDH")
feature_mean <- function(feature_name){
    out <- lapply(mylist, function(dat) dat[feature_name, "Overall"])|> 
        unlist() |> mean(na.rm = TRUE) |> 
        setNames(paste0("mean_",feature_name))
    return(out)
     }

features |> lapply(feature_mean) |> unlist()

#mean_ESR mean_CRP mean_CPK mean_WBC mean_LDH 
#28.52667 28.75000 29.57000 30.78333 30.81333 

EDIT

The synthetic data used in the previous example, mylist, contains only one "Overall" data frame object in each of its elements, so that the extraction of the feature can be applied directly to the data using lapply. However, the actual data that you provided in the updated question, importance_rf has more than one objects in each of its element, with the "Overall" data frame object is in the first element. The difference is the cause of the error you showed in the comment. To apply the extraction, the "Overall" data frames should be extracted first, using lapply(function(list) list[[1]]) and then the previous steps can be applied.

# Extract mean ESR 
importance_rf |> 
 lapply(function(list) list[[1]]) |> 
 lapply(function(dat) dat["ESR", "Overall"]) |> 
 unlist() |> 
 mean(na.rm = TRUE)
#[1] 23.98857

# Extract mean CRP
importance_rf |> 
 lapply(function(list) list[[1]]) |> 
 lapply(function(dat) dat["CRP", "Overall"]) |> 
 unlist() |> 
 mean(na.rm = TRUE)
#[1] 17.4323

A {base R} way

The previous steps can be applied to a vector of features as follows:

features <- c("ESR", "CRP", "CPK", "WBC", "LDH")

feature_mean <- function(feature_name){
     out <- importance_rf |> 
         lapply(function(list) list[[1]]) |>
         lapply(function(dat) dat[feature_name, "Overall"])|> 
         unlist() |> mean(na.rm = TRUE) |> 
         setNames(paste0("mean_",feature_name))
     return(out)
}

# Extract the mean values

features |> lapply(feature_mean) |> unlist()

#mean_ESR mean_CRP mean_CPK mean_WBC mean_LDH 
#23.98857 17.43230 26.19575 26.52498 30.44491 

A brief explanation about the code:

  • lapply(function(list) list[[1]]) extract the first element of each element in important_rf list, which is the data frame that contains the features data.
  • dat[feature_name, "Overall"] extract the value of a targeted feature, feature_name, in each extracted data frame. Only one feature is extracted from each data frame in every step.
  • unlist() converts the data structure of the extracted features, from a list to a numeric vector.
  • setNames create names for the numeric vector to make easy to identify the features of which the means are being calculated.

The functions used in this way all belong to base R category.
You don't need to install any external package to get them.
Another option is to use combinations of base R functions with other functions from purrr package.

A {purrr} way

library(purrr)

importance_rf |> 
  map(pluck(1,1)) |> 
  map(function(dat) set_names(dat[features,], features)) |>
  as.data.frame() |> 
  rowMeans() |> 
  set_names(paste0("mean_", features))

#mean_ESR mean_CRP mean_CPK mean_WBC mean_LDH 
#23.98857 17.43230 26.19575 26.52498 30.44491

These steps are much shorter than the ones in base R above, but what is done in each step might be less obvious.

Note that map is similar with lapply and pluck(x,1,1) is equivalent with x[[1]][[1]].

A brief explanation about the code:

  • map(pluck(1,1)) extract the data frames, similar work with lapply(function(list) list[[1]]) above.
  • map(function(dat) set_names(dat[features,], features)) extracts the list of features, similar with dat[feature_name, "Overall"] above.

There is a difference:

In base R way above, every feature is extracted from all data frames, and then the mean is calculated, and then another feature is extracted the same way.

In this purrr way, all the targeted features are extracted from each data frame in the list, and then the features are combined to become a new data frame by using as.data.frame so that each row represents a feature. Then, rowMeans is used to calculate the mean values of all values of the features.

Note that you can check the result of each step before |> pipe. For example, importance_rf will show all objects in each element.
importance_rf |> map(pluck(1,1)) will show only the data frame objects.

Updates for including weighted means

Here is a simple example of how to calculate weighted means of each feature in your list. Suppose you have this list:

some.list <- list(L1 = c(a = 2, b = 4, c = 7), 
                  L2 = c(a = 5, b = 5, c = 2), 
                  L3 = c(a = 3, b = 3, c = 6))
some.list
$L1
a b c 
2 4 7 

$L2
a b c 
5 5 2 

$L3
a b c 
3 3 6 

And suppose you have the following weight values for L1, L2, and L3 in the list:

weight <- c(w.L1 = 0.5, w.L2=0.6, w.L3 = 0.9)
weight
w.L1 w.L2 w.L3 
 0.5  0.6  0.9 

To calculate the weighted means of a, for example, you need this calculation:

enter image description here

You can get this by multiplying each value of a in the list with the respected normalized weight. In this case, the normalized weight for w1 is w1/(w1+w2+w3).

To do these steps in R:

norm.weight <- weight/sum(weight)
norm.weight
w.L1 w.L2 w.L3 
0.25 0.30 0.45 

# weighted means of a,b, and c
some.list |> map2(norm.weight, `*`) |> as.data.frame() |> rowSums()
   a    b    c 
3.35 3.85 5.05 

Applying these mock weight values to your importance_rf list and the features in the example , we get:

importance_rf |> 
    map(pluck(1,1)) |> 
    map(function(dat) set_names(dat[features,], features)) |>
    map2(norm.weight, `*`) |> 
    as.data.frame() |> 
    rowSums()
    
     ESR      CRP      CPK      WBC      LDH 
23.68084 17.36211 26.72970 25.59180 31.29827 
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文