分段回归以在许多不同的组上找到断点 - 总结

发布于 2025-02-14 00:57:08 字数 2326 浏览 3 评论 0原文

我本质上有一个GLMM模型，我试图分解，以评估在多年数据中许多不同站点损害数据的最佳断点。我预计只有一个断点，人口从下降到增加/稳定。附件是一些虚构的数据：

df2 <-data.frame(site=c('bh',   'bh',   'bh',   'bh',   'bh',   'h1',   'h1',   'h1',   'h1',   'h1',   'h1',   'h1',   'h1',   'ha',   'ha',   'ha',   'ha',   'ha',   'ha',   'ha',   'ho',   'ho',   'ho',   'ho',   'ho',   'ho',   'ho',   'ho',   'sb',   'sb',   'sb',   'sb',   'sb',   'sb',   'sb',   'wm',   'wm',   'wm',   'wm',   'wm'), 
                 ysw=c(0,   2,  3,  5,  12, 0,  1,  2,  3,  4,  5,  6,  13, 0,  1,  2,  3,  4,  5,  8,  0,  1,  2,  3,  4,  5,  6,  13, 1,  2,  3,  4,  5,  6,  13, 1,  2,  3,  5,  12), 
                 lam=c(-1.94246067889122,   0.674238379117774,  0.123986076653627,  1.42751957163926,   -1.30153581808348,  -0.424812155072339, -0.775553269663735, 0.845160077651946,  -0.366964611913739, -0.175440305426086, -0.300162274132754, 0.602168551378997,  0.35237549500052,   0.0568625203766551, 0.0310733148322787, -0.300162274132754, -0.0866196954685079,    -0.213169737066003, -0.0409152237837699,    0.0417873189717518, -0.111072044919427, -0.9810987245661,   0.301247088636211,  0.243286146083446,  0.155205859770556,  -1.46428403001449,  1.00004342727686,   0.699056854547668,  -0.351206453188803, 0.0972573096934199, 0.431524584187451,  -0.526810501182215, -0.424812155072339, 0.602168551378997,  0.628491104967123,  0.778223626766096,  -0.300162274132754, -0.475820321699244, 0.477265995424853,  0.301247088636211
                 ))

我本质上想分别对每个站点进行分割的回归分析，然后创建一个不同估计的断点的数据框架，以便我平均找出大多数站点从下降中恢复的“ YSW”是什么。

我看过这个执行lm（）和在r <中的多个列上进行lm（）和分段（） /a>和如何在使用带有DPLYR软件包的数据帧以执行分段线性回归吗？

我现在需要一种方法来放置输出（至少断点应在哪里每个站点）在数据框中。任何帮助都将不胜感激。

    #Attempt 1:
out <- df2 %>% 
      nest_by(site) %>%
      mutate(my.lm = list(lm(lam ~ ysw, data = data)),
             my.seg = list(tryCatch(segmented(my.lm, seg.Z = ~ ysw),
                                    error = function(e) list(NA))))

原文

I essentially have a glmm model that I am trying to break up to assess where the best breakpoint should be for the data which is compromised of many different sites over multiple years of data. I expect there to be just one breakpoint where the population changes from declining to increasing/stable. Attached are some made-up data:

df2 <-data.frame(site=c('bh',   'bh',   'bh',   'bh',   'bh',   'h1',   'h1',   'h1',   'h1',   'h1',   'h1',   'h1',   'h1',   'ha',   'ha',   'ha',   'ha',   'ha',   'ha',   'ha',   'ho',   'ho',   'ho',   'ho',   'ho',   'ho',   'ho',   'ho',   'sb',   'sb',   'sb',   'sb',   'sb',   'sb',   'sb',   'wm',   'wm',   'wm',   'wm',   'wm'), 
                 ysw=c(0,   2,  3,  5,  12, 0,  1,  2,  3,  4,  5,  6,  13, 0,  1,  2,  3,  4,  5,  8,  0,  1,  2,  3,  4,  5,  6,  13, 1,  2,  3,  4,  5,  6,  13, 1,  2,  3,  5,  12), 
                 lam=c(-1.94246067889122,   0.674238379117774,  0.123986076653627,  1.42751957163926,   -1.30153581808348,  -0.424812155072339, -0.775553269663735, 0.845160077651946,  -0.366964611913739, -0.175440305426086, -0.300162274132754, 0.602168551378997,  0.35237549500052,   0.0568625203766551, 0.0310733148322787, -0.300162274132754, -0.0866196954685079,    -0.213169737066003, -0.0409152237837699,    0.0417873189717518, -0.111072044919427, -0.9810987245661,   0.301247088636211,  0.243286146083446,  0.155205859770556,  -1.46428403001449,  1.00004342727686,   0.699056854547668,  -0.351206453188803, 0.0972573096934199, 0.431524584187451,  -0.526810501182215, -0.424812155072339, 0.602168551378997,  0.628491104967123,  0.778223626766096,  -0.300162274132754, -0.475820321699244, 0.477265995424853,  0.301247088636211
                 ))

I essentially want to run a segmented regression analysis for each site separately and then create a dataframe of the different estimated breakpoints so I can figure out on average what 'ysw' is when most sites recover from a decline.

I have looked at this
Performing lm() and segmented() on multiple columns in R and How to use segmented package when working with data frames with dplyr package to perform piecewise linear regression?

I now need a way to put the output (at least where the breakpoint should be for every site) in a dataframe. Any help would be much appreciated.

    #Attempt 1:
out <- df2 %>% 
      nest_by(site) %>%
      mutate(my.lm = list(lm(lam ~ ysw, data = data)),
             my.seg = list(tryCatch(segmented(my.lm, seg.Z = ~ ysw),
                                    error = function(e) list(NA))))

分享到QQ

分享到微博