R:如何将线性回归结果列表转换为数据框?
我正在对大约 150 种不同的结果进行多元回归。因为通过单独的表或手动收集结果是很麻烦的,所以显然,我尝试从结果中生成一个数据帧。到目前为止,我的步骤:
我为回归创建了一个函数:
f1 <- function(X){summary(lm(X~HFres +age + sex + season + nonalcTE, data=dslin))}
我应用了 apply()
来制作一个列表(在尝试使其发挥作用时,我只使用了 150 个结果中的几个)
m1 <- apply(dslin[,c(21:49)], MARGIN=2, FUN=f1)
然后我将对象更改为数据框:
m2 <- m1 %>% {tibble(variables) = name(.),coefficient = map(., "coefficients"))} %>% unnest_wider(coefficient)
这是结果:
> m2
>A tibble: 29 x 9
> variables `(Intercept)`[,1] [,2] [,3] [,4] HFres[,1] [,2] [,3] [,4]
> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
> 1 C_101_IL8 3.59 0.106 34.0 1.28e-224 0.0000129 0.00367 0.00352 0.997
> 2 C_102_VEGFA 9.28 0.0844 110. 0 0.00425 0.00293 1.45 0.147
> 3 C_103_AM 4.92 0.0820 60.0 0 0.00261 0.00285 0.916 0.360
> 4 C_105_CD40L 7.53 0.164 45.9 0 0.00549 0.00570 0.964 0.335
> 5 C_106_GDF15 6.97 0.0864 80.7 0 0.00196 0.00300 0.653 0.514
> 6 C_107_PlGF 6.25 0.0665 94.0 0 0.00219 0.00231 0.947 0.344
> 7 C_108_SELE 4.89 0.117 41.8 1.14e-321 0.000978 0.00406 0.241 0.810
> 8 C_109_EGF 6.59 0.157 41.9 1.8 e-322 0.00714 0.00546 1.31 0.191
> 9 C_110_OPG 8.21 0.0673 122. 0 0.000320 0.00234 0.137 0.891
>10 C_111_SRC 7.62 0.0511 149. 0 0.000660 0.00177 0.372 0.710
>... with 19 more rows, and 6 more variables: age <dbl[,4]>, sexFemale <dbl[,4]>,
> seasonfall <dbl[,4]>, seasonspring <dbl[,4]>, seasonsummer <dbl[,4]>,
> nonalcTE <dbl[,4]>
有点糟糕看到这里,但最初在 m1 中我有两列,一列包含变量,一列包含列表。然后取消嵌套后,我有几列,每列仍然有 4 列。
当我将其导出到 Excel(使用 rio
包)时,仅显示 [,1]
列,因为列 '(Intercept)'
,HF res
,ecc。仍然是嵌套的。
我尝试再次应用 unnest_wider()
命令
m2 %>% unnest_wider(list=c('(Intercept)', 'HFres', 'age', 'sexFemale', ' seasonfall', 'seasonspring', 'seasonsummer')
这不起作用,因为它不接受我想要取消嵌套列列表而不是 的变量,
然后我尝试仅使用一个以
m2 %>% unnest_wider(HFres)
开头
这也给了我错误。
所以,我剩下的问题是我仍然需要取消嵌套列。 m2 以便在导出它们时使它们全部可见。
或者,只有 [,1]
和 [,4]
子列就足够了。每列的如果我知道我可以像这样访问一个子列: m2[["age"]][,1] 也许我可以从 m2 中提取所有列来创建一个新的数据帧我想要吗?
谢谢您的帮助!
更新:reprex(我希望这是对 reprex 的正确理解)
create dataframe
age <- c(34, 56, 24, 78, 56, 67, 45, 93, 62, 16)
体重指数 <- c(24, 25, 27, 23, 2, 27, 28, 24, 27, 21)
edu <- c(4,2,5,1,3,2,4,5,2,3) 吸烟 <- c(1,3,2,2,3,2,1,3,2,1)
HF <- c(3,4,2,4,5,3,2,3,5,2)
P1 <- c(5,4,7,9,5,6,7,3,4,2)
P2 <- c(7,2,4,6,5,3,2,5,6,3)
P3 <- c(6,4,2,3,5,7,3,2,5,6)
df <- data.frame(age, bmi, educ,吸烟、HF、P1、P2、P3)
功能 f1 <- function(X){summary(lm(X~HF +age + bmi + educ + Smoke, data=df))}
将函数应用于列 m1 <- apply(df[,c(6:8)], MARGIN=2, FUN=f1)
m2 <- m1 %>% {tibble(变量 = 名称) (.),coefficient = map(., "coefficients"))} %>% unnest_wider(coefficient)
我基本上需要 coefficient
(beta) 是每列的 [,1]
,p 值是 [,4]
I am running a multivariate regression on ~150 different outcomes. Because gathering the results by individual tables or by hand is tidious, obviously, I have tried to produce a datafram out of the results. So far my steps:
I made a function for the regression:
f1 <- function(X){summary(lm(X~HFres + age + sex + season + nonalcTE, data=dslin))}
I applied apply()
to make a list (I only used a few of the 150 outcomes while trying to make it work)
m1 <- apply(dslin[,c(21:49)], MARGIN=2, FUN=f1)
Then I change the object into a dataframe:
m2 <- m1 %>% {tibble(variables = names(.),coefficient = map(., "coefficients"))} %>% unnest_wider(coefficient)
This is the result:
> m2
>A tibble: 29 x 9
> variables `(Intercept)`[,1] [,2] [,3] [,4] HFres[,1] [,2] [,3] [,4]
> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
> 1 C_101_IL8 3.59 0.106 34.0 1.28e-224 0.0000129 0.00367 0.00352 0.997
> 2 C_102_VEGFA 9.28 0.0844 110. 0 0.00425 0.00293 1.45 0.147
> 3 C_103_AM 4.92 0.0820 60.0 0 0.00261 0.00285 0.916 0.360
> 4 C_105_CD40L 7.53 0.164 45.9 0 0.00549 0.00570 0.964 0.335
> 5 C_106_GDF15 6.97 0.0864 80.7 0 0.00196 0.00300 0.653 0.514
> 6 C_107_PlGF 6.25 0.0665 94.0 0 0.00219 0.00231 0.947 0.344
> 7 C_108_SELE 4.89 0.117 41.8 1.14e-321 0.000978 0.00406 0.241 0.810
> 8 C_109_EGF 6.59 0.157 41.9 1.8 e-322 0.00714 0.00546 1.31 0.191
> 9 C_110_OPG 8.21 0.0673 122. 0 0.000320 0.00234 0.137 0.891
>10 C_111_SRC 7.62 0.0511 149. 0 0.000660 0.00177 0.372 0.710
>... with 19 more rows, and 6 more variables: age <dbl[,4]>, sexFemale <dbl[,4]>,
> seasonfall <dbl[,4]>, seasonspring <dbl[,4]>, seasonsummer <dbl[,4]>,
> nonalcTE <dbl[,4]>
It's a bit bad to see here but initially in m1 I had two columns, one with the variables and one with a list. Then after unnesting I have several columns which still each have 4 columns.
When I export this to excell (with the rio
package) only the [,1]
columns show up because the columns '(Intercept)'
, HF res
, ecc. are still nested.
I have tried applying the unnest_wider()
command again
m2 %>% unnest_wider(list=c('(Intercept)', 'HFres', 'age', 'sexFemale', 'seasonfall', 'seasonspring', 'seasonsummer')
This didn't work, because it didn't accept that I want to unnest a list of columns instead of a dataframe.
I then tried it for only one of the variables to start with
m2 %>% unnest_wider(HFres)
This also gave me errors.
So, my remaining problem is I still need to unnest the columns of m2 in order to make them all visible when I export them.
Alternatively, It would be enough for me to have only the [,1]
and [,4]
subcolumn of each column if that is easier to extract them. I know I can e.g. access one subcolumn like this: m2[["age"]][,1]
and maybe I could make a new dataframe from m2 extracting all the columns I want?
Thank you for your help!
Update: reprex ( I hope this is a correct understanding of what a reprex is)
create dataframe
age <- c(34, 56, 24, 78, 56, 67, 45, 93, 62, 16)
bmi <- c(24, 25, 27, 23, 2, 27, 28, 24, 27, 21)
educ <- c(4,2,5,1,3,2,4,5,2,3)
smoking <- c(1,3,2,2,3,2,1,3,2,1)
HF <- c(3,4,2,4,5,3,2,3,5,2)
P1 <- c(5,4,7,9,5,6,7,3,4,2)
P2 <- c(7,2,4,6,5,3,2,5,6,3)
P3 <- c(6,4,2,3,5,7,3,2,5,6)
df <- data.frame(age, bmi, educ, smoking, HF, P1, P2, P3)
functionf1 <- function(X){summary(lm(X~HF + age + bmi + educ + smoking, data=df))}
apply function to columnsm1 <- apply(df[,c(6:8)], MARGIN=2, FUN=f1)
m2 <- m1 %>% {tibble(variables = names(.),coefficient = map(., "coefficients"))} %>% unnest_wider(coefficient)
I basically need the coefficient
(beta) which is the [,1]
of each column and the p-value which is the [,4]
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
broom 包正是用于此目的 - 将模型结果转换为整洁的数据帧。下面是一个使用
broom::tidy()
获取每个 dv 的系数表,并使用purrr::map_dfr()
迭代 dvs 的示例,行绑定系数表,并为每个模型添加一个带有dv
的列:输出:
如果您希望 dv 在行中,系数在列中,您可以tidyr::pivot_wider():
输出:
The broom package is intended for exactly this — turning model results into tidy dataframes. Here’s an example using
broom::tidy()
to get a table of coefficients for each dv, andpurrr::map_dfr()
to iterate over dvs, row-bind the coefficient tables, and add a column with thedv
for each model:Output:
If you want dvs in rows and coefficients in columns, you can
tidyr::pivot_wider()
:Output: