当前位置：文江博客话题详情

R：如何将线性回归结果列表转换为数据框？

发布于 2025-01-15 02:58:56 字数 3598 浏览 1 评论 0原文

我正在对大约 150 种不同的结果进行多元回归。因为通过单独的表或手动收集结果是很麻烦的，所以显然，我尝试从结果中生成一个数据帧。到目前为止，我的步骤：

我为回归创建了一个函数：

f1 <- function(X){summary(lm(X~HFres +age + sex + season + nonalcTE, data=dslin))}

我应用了 apply() 来制作一个列表（在尝试使其发挥作用时，我只使用了 150 个结果中的几个）

m1 <- apply(dslin[,c(21:49)], MARGIN=2, FUN=f1)

然后我将对象更改为数据框：

m2 <- m1 %>% {tibble(variables) = name(.),coefficient = map(., "coefficients"))} %>% unnest_wider(coefficient)

这是结果：

> m2
>A tibble: 29 x 9
>   variables   `(Intercept)`[,1]   [,2]  [,3]      [,4] HFres[,1]    [,2]    [,3]  [,4]
>   <chr>                   <dbl>  <dbl> <dbl>     <dbl>     <dbl>   <dbl>   <dbl> <dbl>
> 1 C_101_IL8                3.59 0.106   34.0 1.28e-224 0.0000129 0.00367 0.00352 0.997
> 2 C_102_VEGFA              9.28 0.0844 110.  0         0.00425   0.00293 1.45    0.147
> 3 C_103_AM                 4.92 0.0820  60.0 0         0.00261   0.00285 0.916   0.360
> 4 C_105_CD40L              7.53 0.164   45.9 0         0.00549   0.00570 0.964   0.335
> 5 C_106_GDF15              6.97 0.0864  80.7 0         0.00196   0.00300 0.653   0.514
> 6 C_107_PlGF               6.25 0.0665  94.0 0         0.00219   0.00231 0.947   0.344
> 7 C_108_SELE               4.89 0.117   41.8 1.14e-321 0.000978  0.00406 0.241   0.810
> 8 C_109_EGF                6.59 0.157   41.9 1.8 e-322 0.00714   0.00546 1.31    0.191
> 9 C_110_OPG                8.21 0.0673 122.  0         0.000320  0.00234 0.137   0.891
>10 C_111_SRC                7.62 0.0511 149.  0         0.000660  0.00177 0.372   0.710
>... with 19 more rows, and 6 more variables: age <dbl[,4]>, sexFemale <dbl[,4]>,
>  seasonfall <dbl[,4]>, seasonspring <dbl[,4]>, seasonsummer <dbl[,4]>,
>  nonalcTE <dbl[,4]>

有点糟糕看到这里，但最初在 m1 中我有两列，一列包含变量，一列包含列表。然后取消嵌套后，我有几列，每列仍然有 4 列。

当我将其导出到 Excel（使用 rio 包）时，仅显示 [,1] 列，因为列 '(Intercept)' ，HF res，ecc。仍然是嵌套的。

我尝试再次应用 unnest_wider() 命令

m2 %>% unnest_wider(list=c('(Intercept)', 'HFres', 'age', 'sexFemale', ' seasonfall', 'seasonspring', 'seasonsummer')

这不起作用，因为它不接受我想要取消嵌套列列表而不是的变量，

然后我尝试仅使用一个以

m2 %>% unnest_wider(HFres) 开头

这也给了我错误。

所以，我剩下的问题是我仍然需要取消嵌套列。 m2 以便在导出它们时使它们全部可见。

或者，只有 [,1] 和 [,4] 子列就足够了。每列的如果我知道我可以像这样访问一个子列： m2[["age"]][,1] 也许我可以从 m2 中提取所有列来创建一个新的数据帧我想要吗？

谢谢您的帮助！

更新：reprex（我希望这是对 reprex 的正确理解）

create dataframe

age <- c(34, 56, 24, 78, 56, 67, 45, 93, 62, 16) 体重指数 <- c(24, 25, 27, 23, 2, 27, 28, 24, 27, 21) edu <- c(4,2,5,1,3,2,4,5,2,3) 吸烟 <- c(1,3,2,2,3,2,1,3,2,1) HF <- c(3,4,2,4,5,3,2,3,5,2) P1 <- c(5,4,7,9,5,6,7,3,4,2) P2 <- c(7,2,4,6,5,3,2,5,6,3) P3 <- c(6,4,2,3,5,7,3,2,5,6)

df <- data.frame(age, bmi, educ,吸烟、HF、P1、P2、P3)

功能 f1 <- function(X){summary(lm(X~HF +age + bmi + educ + Smoke, data=df))}

将函数应用于列 m1 <- apply(df[,c(6:8)], MARGIN=2, FUN=f1)

m2 <- m1 %>% {tibble(变量 = 名称) (.),coefficient = map(., "coefficients"))} %>% unnest_wider(coefficient)

我基本上需要 coefficient (beta) 是每列的 [,1] ，p 值是 [,4]

原文

I am running a multivariate regression on ~150 different outcomes. Because gathering the results by individual tables or by hand is tidious, obviously, I have tried to produce a datafram out of the results. So far my steps:

I made a function for the regression:

f1 <- function(X){summary(lm(X~HFres + age + sex + season + nonalcTE, data=dslin))}

I applied apply() to make a list (I only used a few of the 150 outcomes while trying to make it work)

m1 <- apply(dslin[,c(21:49)], MARGIN=2, FUN=f1)

Then I change the object into a dataframe:

m2 <- m1 %>% {tibble(variables = names(.),coefficient = map(., "coefficients"))} %>% unnest_wider(coefficient)

This is the result:

> m2
>A tibble: 29 x 9
>   variables   `(Intercept)`[,1]   [,2]  [,3]      [,4] HFres[,1]    [,2]    [,3]  [,4]
>   <chr>                   <dbl>  <dbl> <dbl>     <dbl>     <dbl>   <dbl>   <dbl> <dbl>
> 1 C_101_IL8                3.59 0.106   34.0 1.28e-224 0.0000129 0.00367 0.00352 0.997
> 2 C_102_VEGFA              9.28 0.0844 110.  0         0.00425   0.00293 1.45    0.147
> 3 C_103_AM                 4.92 0.0820  60.0 0         0.00261   0.00285 0.916   0.360
> 4 C_105_CD40L              7.53 0.164   45.9 0         0.00549   0.00570 0.964   0.335
> 5 C_106_GDF15              6.97 0.0864  80.7 0         0.00196   0.00300 0.653   0.514
> 6 C_107_PlGF               6.25 0.0665  94.0 0         0.00219   0.00231 0.947   0.344
> 7 C_108_SELE               4.89 0.117   41.8 1.14e-321 0.000978  0.00406 0.241   0.810
> 8 C_109_EGF                6.59 0.157   41.9 1.8 e-322 0.00714   0.00546 1.31    0.191
> 9 C_110_OPG                8.21 0.0673 122.  0         0.000320  0.00234 0.137   0.891
>10 C_111_SRC                7.62 0.0511 149.  0         0.000660  0.00177 0.372   0.710
>... with 19 more rows, and 6 more variables: age <dbl[,4]>, sexFemale <dbl[,4]>,
>  seasonfall <dbl[,4]>, seasonspring <dbl[,4]>, seasonsummer <dbl[,4]>,
>  nonalcTE <dbl[,4]>

It's a bit bad to see here but initially in m1 I had two columns, one with the variables and one with a list. Then after unnesting I have several columns which still each have 4 columns.

When I export this to excell (with the rio package) only the [,1] columns show up because the columns '(Intercept)', HF res, ecc. are still nested.

I have tried applying the unnest_wider() command again

m2 %>% unnest_wider(list=c('(Intercept)', 'HFres', 'age', 'sexFemale', 'seasonfall', 'seasonspring', 'seasonsummer')

This didn't work, because it didn't accept that I want to unnest a list of columns instead of a dataframe.

I then tried it for only one of the variables to start with

m2 %>% unnest_wider(HFres)

This also gave me errors.

So, my remaining problem is I still need to unnest the columns of m2 in order to make them all visible when I export them.

Alternatively, It would be enough for me to have only the [,1] and [,4] subcolumn of each column if that is easier to extract them. I know I can e.g. access one subcolumn like this: m2[["age"]][,1] and maybe I could make a new dataframe from m2 extracting all the columns I want?

Thank you for your help!

Update: reprex ( I hope this is a correct understanding of what a reprex is)

create dataframe

age <- c(34, 56, 24, 78, 56, 67, 45, 93, 62, 16)
bmi <- c(24, 25, 27, 23, 2, 27, 28, 24, 27, 21)
educ <- c(4,2,5,1,3,2,4,5,2,3)
smoking <- c(1,3,2,2,3,2,1,3,2,1)
HF <- c(3,4,2,4,5,3,2,3,5,2)
P1 <- c(5,4,7,9,5,6,7,3,4,2)
P2 <- c(7,2,4,6,5,3,2,5,6,3)
P3 <- c(6,4,2,3,5,7,3,2,5,6)

df <- data.frame(age, bmi, educ, smoking, HF, P1, P2, P3)

function
f1 <- function(X){summary(lm(X~HF + age + bmi + educ + smoking, data=df))}

apply function to columns
m1 <- apply(df[,c(6:8)], MARGIN=2, FUN=f1)

m2 <- m1 %>% {tibble(variables = names(.),coefficient = map(., "coefficients"))} %>% unnest_wider(coefficient)

I basically need the coefficient (beta) which is the [,1] of each column and the p-value which is the [,4]

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

記憶穿過時間隧道 2025-01-22 02:58:56

broom 包正是用于此目的 - 将模型结果转换为整洁的数据帧。下面是一个使用 broom::tidy() 获取每个 dv 的系数表，并使用 purrr::map_dfr() 迭代 dvs 的示例，行绑定系数表，并为每个模型添加一个带有 dv 的列：

library(broom)
library(purrr)

f1 <- function(X) {
  tidy(lm(
    as.formula(paste(X, "~ mpg * cyl")),
    data = mtcars
  ))
}

model_results <- map_dfr(
  set_names(names(mtcars)[3:11]), 
  f1,
  .id = "dv"
)

model_results

输出：

# A tibble: 36 x 6
   dv    term         estimate std.error statistic  p.value
   <chr> <chr>           <dbl>     <dbl>     <dbl>    <dbl>
 1 disp  (Intercept) -242.      154.        -1.57  0.128   
 2 disp  mpg           10.3       6.47       1.59  0.123   
 3 disp  cyl          103.       22.9        4.52  0.000104
 4 disp  mpg:cyl       -3.24      1.18      -2.75  0.0104  
 5 hp    (Intercept)  -86.5     123.        -0.704 0.487   
 6 hp    mpg            4.59      5.16       0.889 0.381   
 7 hp    cyl           50.3      18.2        2.75  0.0102  
 8 hp    mpg:cyl       -1.47      0.940     -1.57  0.128   
 9 drat  (Intercept)    3.34      1.28       2.61  0.0145  
10 drat  mpg            0.0541    0.0538     1.01  0.323   
# ... with 26 more rows

如果您希望 dv 在行中，系数在列中，您可以tidyr::pivot_wider()：

library(tidyr)

model_coefs <- pivot_wider(
  model_results,
  id_cols = dv,
  names_from = term,
  values_from = estimate
)

model_coefs

输出：

# A tibble: 9 x 5
  dv    `(Intercept)`      mpg      cyl `mpg:cyl`
  <chr>         <dbl>    <dbl>    <dbl>     <dbl>
1 disp       -242.    10.3     103.     -3.24    
2 hp          -86.5    4.59     50.3    -1.47    
3 drat          3.34   0.0541   -0.0354 -0.00533 
4 wt            2.98  -0.00947   0.478  -0.0219  
5 qsec         25.0   -0.0938   -0.862   0.000318
6 vs            2.38  -0.0194   -0.292   0.00223 
7 am           -0.908  0.0702    0.0721 -0.00470 
8 gear          4.22   0.0115   -0.181   0.00311 
9 carb          3.32  -0.0830    0.249  -0.00333

The broom package is intended for exactly this — turning model results into tidy dataframes. Here’s an example using broom::tidy() to get a table of coefficients for each dv, and purrr::map_dfr() to iterate over dvs, row-bind the coefficient tables, and add a column with the dv for each model:

library(broom)
library(purrr)

f1 <- function(X) {
  tidy(lm(
    as.formula(paste(X, "~ mpg * cyl")),
    data = mtcars
  ))
}

model_results <- map_dfr(
  set_names(names(mtcars)[3:11]), 
  f1,
  .id = "dv"
)

model_results

Output:

# A tibble: 36 x 6
   dv    term         estimate std.error statistic  p.value
   <chr> <chr>           <dbl>     <dbl>     <dbl>    <dbl>
 1 disp  (Intercept) -242.      154.        -1.57  0.128   
 2 disp  mpg           10.3       6.47       1.59  0.123   
 3 disp  cyl          103.       22.9        4.52  0.000104
 4 disp  mpg:cyl       -3.24      1.18      -2.75  0.0104  
 5 hp    (Intercept)  -86.5     123.        -0.704 0.487   
 6 hp    mpg            4.59      5.16       0.889 0.381   
 7 hp    cyl           50.3      18.2        2.75  0.0102  
 8 hp    mpg:cyl       -1.47      0.940     -1.57  0.128   
 9 drat  (Intercept)    3.34      1.28       2.61  0.0145  
10 drat  mpg            0.0541    0.0538     1.01  0.323   
# ... with 26 more rows

If you want dvs in rows and coefficients in columns, you can tidyr::pivot_wider():

library(tidyr)

model_coefs <- pivot_wider(
  model_results,
  id_cols = dv,
  names_from = term,
  values_from = estimate
)

model_coefs

Output:

# A tibble: 9 x 5
  dv    `(Intercept)`      mpg      cyl `mpg:cyl`
  <chr>         <dbl>    <dbl>    <dbl>     <dbl>
1 disp       -242.    10.3     103.     -3.24    
2 hp          -86.5    4.59     50.3    -1.47    
3 drat          3.34   0.0541   -0.0354 -0.00533 
4 wt            2.98  -0.00947   0.478  -0.0219  
5 qsec         25.0   -0.0938   -0.862   0.000318
6 vs            2.38  -0.0194   -0.292   0.00223 
7 am           -0.908  0.0702    0.0721 -0.00470 
8 gear          4.22   0.0115   -0.181   0.00311 
9 carb          3.32  -0.0830    0.249  -0.00333

回复收藏 0 原文

~没有更多了~