如何对多列执行线性回归并获得数据帧输出：回归方程和 r 平方值？

发布于 2025-01-09 14:47:31 字数 1730 浏览 7 评论 0原文

我的数据框看起来像这样

df = structure(list(Date_Time_GMT_3 = structure(c(1625025600, 1625026500,1625027400, 1625028300, 1625029200, 1625030100), 
                                                class = c("POSIXct", "POSIXt"), tzone = "EST"), 
                    X20676887_X2LH_S = c(26.879, 26.781,26.683, 26.585, 26.488, 26.39), 
                    X20819831_11LH_S = c(26.39, 26.292, 26.195, 26.195, 26.097, 26), 
                    X20822214_X4LH_S = c(26.39, 26.292,26.292, 26.195, 26.097, 26), 
                    LH27_20822244_U_Stationary = c(23.388, 23.292, 23.292, 23.196, 23.196, 23.196)), 
               row.names = 2749:2754, class = "data.frame")

，我试图获取所有列的线性回归方程和 R 平方值，其中包含 string "Stationary" 的列将始终位于 x 轴上。

到目前为止，我可以针对 "stationary" 列执行 1 列的线性回归


model = lm(df$LH27_20822244_U_Stationary ~
             df$X20822214_X4LH_S, df)

，当我

summary(model)

之后使用时，它会给我一些我想要在数据框中的值（即 R squared，估计标准，标准错误，Pr(>|t|))，但我需要帮助的两件事是：

我仍然需要每列的回归方程名称中没有 stationary
我需要为名称中没有 stationary 的每个列提供这些值，并且我需要将其作为一个数据框看起来是这样...

 Logger_ID        Reg_equation R_Squared Estimate_Std. Std_Error  Pr_t..
  <chr>            <int>               <int>     <int>        <int>     <int>   
1 X20676887_X2LH_S NA                  NA        NA            NA         NA      
2 X20819831_11LH_S NA                  NA        NA            NA         NA      
3 X20822214_X4LH_S NA                  NA        NA            NA         NA

原文

My dataframe looks like this

df = structure(list(Date_Time_GMT_3 = structure(c(1625025600, 1625026500,1625027400, 1625028300, 1625029200, 1625030100), 
                                                class = c("POSIXct", "POSIXt"), tzone = "EST"), 
                    X20676887_X2LH_S = c(26.879, 26.781,26.683, 26.585, 26.488, 26.39), 
                    X20819831_11LH_S = c(26.39, 26.292, 26.195, 26.195, 26.097, 26), 
                    X20822214_X4LH_S = c(26.39, 26.292,26.292, 26.195, 26.097, 26), 
                    LH27_20822244_U_Stationary = c(23.388, 23.292, 23.292, 23.196, 23.196, 23.196)), 
               row.names = 2749:2754, class = "data.frame")

and I'm trying to get the linear regression equations and R squared values for all columns where the column with the string "Stationary" in it will always be on the x-axis.

so far I can perform the linear regression for 1 column against the "stationary" column


model = lm(df$LH27_20822244_U_Stationary ~
             df$X20822214_X4LH_S, df)

and when I use

summary(model)

afterwards it gives me some values I would like in a dataframe (i.e R squared, Estimate Std., Std. Error, Pr(>|t|)) but 2 things I need to help with are:

I still need the regression equation for each column that doesn't have stationary in the name
I need these values for each of the columns that don't have stationary in it's name, and I need that to be a dataframe that looks like so...

 Logger_ID        Reg_equation R_Squared Estimate_Std. Std_Error  Pr_t..
  <chr>            <int>               <int>     <int>        <int>     <int>   
1 X20676887_X2LH_S NA                  NA        NA            NA         NA      
2 X20819831_11LH_S NA                  NA        NA            NA         NA      
3 X20822214_X4LH_S NA                  NA        NA            NA         NA

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

稚然 2025-01-16 14:47:31

像这样的东西：

library(tidyverse)
library(broom)
df1 %>% 
  pivot_longer(
    cols = starts_with("X")
  ) %>% 
  mutate(name = factor(name)) %>% 
  group_by(name) %>% 
  group_split() %>% 
  map_dfr(.f = function(df){
    lm(LH27_20822244_U_Stationary ~ value, data = df) %>% 
      glance() %>% 
      # tidy() %>%  
      add_column(name = unique(df$name), .before=1)
  })

使用 tidy()

  name             term        estimate std.error statistic p.value
  <fct>            <chr>          <dbl>     <dbl>     <dbl>   <dbl>
1 X20676887_X2LH_S (Intercept)   12.8      2.28        5.62 0.00494
2 X20676887_X2LH_S value          0.393    0.0855      4.59 0.0101 
3 X20819831_11LH_S (Intercept)   10.4      3.72        2.79 0.0495 
4 X20819831_11LH_S value          0.492    0.142       3.47 0.0256 
5 X20822214_X4LH_S (Intercept)   10.5      3.30        3.20 0.0329 
6 X20822214_X4LH_S value          0.485    0.126       3.86 0.0182

使用 glance()

  name          r.squared adj.r.squared  sigma statistic p.value    df logLik   AIC   BIC deviance df.residual  nobs
  <fct>             <dbl>         <dbl>  <dbl>     <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>       <int> <int>
1 X20676887_X2~     0.841         0.801 0.0350      21.1  0.0101     1   12.8 -19.6 -20.3  0.00490           4     6
2 X20819831_11~     0.751         0.688 0.0438      12.0  0.0256     1   11.5 -17.0 -17.6  0.00766           4     6
3 X20822214_X4~     0.788         0.735 0.0403      14.9  0.0182     1   12.0 -17.9 -18.6  0.00651           4     6

Something like this:

library(tidyverse)
library(broom)
df1 %>% 
  pivot_longer(
    cols = starts_with("X")
  ) %>% 
  mutate(name = factor(name)) %>% 
  group_by(name) %>% 
  group_split() %>% 
  map_dfr(.f = function(df){
    lm(LH27_20822244_U_Stationary ~ value, data = df) %>% 
      glance() %>% 
      # tidy() %>%  
      add_column(name = unique(df$name), .before=1)
  })

Using tidy()

  name             term        estimate std.error statistic p.value
  <fct>            <chr>          <dbl>     <dbl>     <dbl>   <dbl>
1 X20676887_X2LH_S (Intercept)   12.8      2.28        5.62 0.00494
2 X20676887_X2LH_S value          0.393    0.0855      4.59 0.0101 
3 X20819831_11LH_S (Intercept)   10.4      3.72        2.79 0.0495 
4 X20819831_11LH_S value          0.492    0.142       3.47 0.0256 
5 X20822214_X4LH_S (Intercept)   10.5      3.30        3.20 0.0329 
6 X20822214_X4LH_S value          0.485    0.126       3.86 0.0182

Using glance()

  name          r.squared adj.r.squared  sigma statistic p.value    df logLik   AIC   BIC deviance df.residual  nobs
  <fct>             <dbl>         <dbl>  <dbl>     <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>       <int> <int>
1 X20676887_X2~     0.841         0.801 0.0350      21.1  0.0101     1   12.8 -19.6 -20.3  0.00490           4     6
2 X20819831_11~     0.751         0.688 0.0438      12.0  0.0256     1   11.5 -17.0 -17.6  0.00766           4     6
3 X20822214_X4~     0.788         0.735 0.0403      14.9  0.0182     1   12.0 -17.9 -18.6  0.00651           4     6

回复收藏 0 原文

~没有更多了~