如何对多列执行线性回归并获得数据帧输出:回归方程和 r 平方值?

发布于 2025-01-09 14:47:31 字数 1730 浏览 0 评论 0原文

我的数据框看起来像这样

df = structure(list(Date_Time_GMT_3 = structure(c(1625025600, 1625026500,1625027400, 1625028300, 1625029200, 1625030100), 
                                                class = c("POSIXct", "POSIXt"), tzone = "EST"), 
                    X20676887_X2LH_S = c(26.879, 26.781,26.683, 26.585, 26.488, 26.39), 
                    X20819831_11LH_S = c(26.39, 26.292, 26.195, 26.195, 26.097, 26), 
                    X20822214_X4LH_S = c(26.39, 26.292,26.292, 26.195, 26.097, 26), 
                    LH27_20822244_U_Stationary = c(23.388, 23.292, 23.292, 23.196, 23.196, 23.196)), 
               row.names = 2749:2754, class = "data.frame")

,我试图获取所有列的线性回归方程和 R 平方值,其中包含 string "Stationary" 的列将始终位于 x 轴上。

到目前为止,我可以针对 "stationary" 列执行 1 列的线性回归


model = lm(df$LH27_20822244_U_Stationary ~
             df$X20822214_X4LH_S, df)

,当我

summary(model)

之后使用时,它会给我一些我想要在数据框中的值(即 R squared估计标准标准错误Pr(>|t|)),但我需要帮助的两件事是:

  1. 我仍然需要每列的回归方程名称中没有 stationary
  2. 我需要为名称中没有 stationary 的每个列提供这些值,并且我需要将其作为一个数据框看起来是这样...
 Logger_ID        Reg_equation R_Squared Estimate_Std. Std_Error  Pr_t..
  <chr>            <int>               <int>     <int>        <int>     <int>   
1 X20676887_X2LH_S NA                  NA        NA            NA         NA      
2 X20819831_11LH_S NA                  NA        NA            NA         NA      
3 X20822214_X4LH_S NA                  NA        NA            NA         NA      

My dataframe looks like this

df = structure(list(Date_Time_GMT_3 = structure(c(1625025600, 1625026500,1625027400, 1625028300, 1625029200, 1625030100), 
                                                class = c("POSIXct", "POSIXt"), tzone = "EST"), 
                    X20676887_X2LH_S = c(26.879, 26.781,26.683, 26.585, 26.488, 26.39), 
                    X20819831_11LH_S = c(26.39, 26.292, 26.195, 26.195, 26.097, 26), 
                    X20822214_X4LH_S = c(26.39, 26.292,26.292, 26.195, 26.097, 26), 
                    LH27_20822244_U_Stationary = c(23.388, 23.292, 23.292, 23.196, 23.196, 23.196)), 
               row.names = 2749:2754, class = "data.frame")

and I'm trying to get the linear regression equations and R squared values for all columns where the column with the string "Stationary" in it will always be on the x-axis.

so far I can perform the linear regression for 1 column against the "stationary" column


model = lm(df$LH27_20822244_U_Stationary ~
             df$X20822214_X4LH_S, df)

and when I use

summary(model)

afterwards it gives me some values I would like in a dataframe (i.e R squared, Estimate Std., Std. Error, Pr(>|t|)) but 2 things I need to help with are:

  1. I still need the regression equation for each column that doesn't have stationary in the name
  2. I need these values for each of the columns that don't have stationary in it's name, and I need that to be a dataframe that looks like so...
 Logger_ID        Reg_equation R_Squared Estimate_Std. Std_Error  Pr_t..
  <chr>            <int>               <int>     <int>        <int>     <int>   
1 X20676887_X2LH_S NA                  NA        NA            NA         NA      
2 X20819831_11LH_S NA                  NA        NA            NA         NA      
3 X20822214_X4LH_S NA                  NA        NA            NA         NA      

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

稚然 2025-01-16 14:47:31

像这样的东西:

library(tidyverse)
library(broom)
df1 %>% 
  pivot_longer(
    cols = starts_with("X")
  ) %>% 
  mutate(name = factor(name)) %>% 
  group_by(name) %>% 
  group_split() %>% 
  map_dfr(.f = function(df){
    lm(LH27_20822244_U_Stationary ~ value, data = df) %>% 
      glance() %>% 
      # tidy() %>%  
      add_column(name = unique(df$name), .before=1)
  })

使用 tidy()

  name             term        estimate std.error statistic p.value
  <fct>            <chr>          <dbl>     <dbl>     <dbl>   <dbl>
1 X20676887_X2LH_S (Intercept)   12.8      2.28        5.62 0.00494
2 X20676887_X2LH_S value          0.393    0.0855      4.59 0.0101 
3 X20819831_11LH_S (Intercept)   10.4      3.72        2.79 0.0495 
4 X20819831_11LH_S value          0.492    0.142       3.47 0.0256 
5 X20822214_X4LH_S (Intercept)   10.5      3.30        3.20 0.0329 
6 X20822214_X4LH_S value          0.485    0.126       3.86 0.0182 

使用 glance()

  name          r.squared adj.r.squared  sigma statistic p.value    df logLik   AIC   BIC deviance df.residual  nobs
  <fct>             <dbl>         <dbl>  <dbl>     <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>       <int> <int>
1 X20676887_X2~     0.841         0.801 0.0350      21.1  0.0101     1   12.8 -19.6 -20.3  0.00490           4     6
2 X20819831_11~     0.751         0.688 0.0438      12.0  0.0256     1   11.5 -17.0 -17.6  0.00766           4     6
3 X20822214_X4~     0.788         0.735 0.0403      14.9  0.0182     1   12.0 -17.9 -18.6  0.00651           4     6

Something like this:

library(tidyverse)
library(broom)
df1 %>% 
  pivot_longer(
    cols = starts_with("X")
  ) %>% 
  mutate(name = factor(name)) %>% 
  group_by(name) %>% 
  group_split() %>% 
  map_dfr(.f = function(df){
    lm(LH27_20822244_U_Stationary ~ value, data = df) %>% 
      glance() %>% 
      # tidy() %>%  
      add_column(name = unique(df$name), .before=1)
  })

Using tidy()

  name             term        estimate std.error statistic p.value
  <fct>            <chr>          <dbl>     <dbl>     <dbl>   <dbl>
1 X20676887_X2LH_S (Intercept)   12.8      2.28        5.62 0.00494
2 X20676887_X2LH_S value          0.393    0.0855      4.59 0.0101 
3 X20819831_11LH_S (Intercept)   10.4      3.72        2.79 0.0495 
4 X20819831_11LH_S value          0.492    0.142       3.47 0.0256 
5 X20822214_X4LH_S (Intercept)   10.5      3.30        3.20 0.0329 
6 X20822214_X4LH_S value          0.485    0.126       3.86 0.0182 

Using glance()

  name          r.squared adj.r.squared  sigma statistic p.value    df logLik   AIC   BIC deviance df.residual  nobs
  <fct>             <dbl>         <dbl>  <dbl>     <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>       <int> <int>
1 X20676887_X2~     0.841         0.801 0.0350      21.1  0.0101     1   12.8 -19.6 -20.3  0.00490           4     6
2 X20819831_11~     0.751         0.688 0.0438      12.0  0.0256     1   11.5 -17.0 -17.6  0.00766           4     6
3 X20822214_X4~     0.788         0.735 0.0403      14.9  0.0182     1   12.0 -17.9 -18.6  0.00651           4     6
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文