约翰·图基“中位数” R(或“阻力线”)统计检验和线性回归

发布于 2024-09-09 04:26:02 字数 465 浏览 10 评论 0原文

我正在搜索 John Tukey 算法，该算法在使用 R 的线性回归中计算“阻力线”或“中线-中线”。

邮件列表上的一名学生用以下术语解释了该算法：

“计算方式是除以将数据分成三组，找出 x 中值和 y 中值（称为每个组的总结点），以及然后使用这三个总结点确定线路。外面两个总结点决定斜率，以及所有这些的平均值确定截距。”

关于 John tukey 的中位数的文章出于好奇： http://www.johndcook.com/blog/2009/06/23/tukey-median-ninther/

你知道我在哪里可以找到这个算法或 R 函数吗？多谢！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

蓝颜夕 2024-09-16 04:26:02

有关于如何计算中位数线的描述

median_median_line <- function(x, y, data)
{
  if(!missing(data))
  {
    x <- eval(substitute(x), data) 
    y <- eval(substitute(y), data) 
  }
  
  stopifnot(length(x) == length(y))

  #Step 1
  one_third_length <- floor(length(x) / 3)
  groups <- rep(1:3, times = switch((length(x) %% 3) + 1,
     one_third_length,
     c(one_third_length, one_third_length + 1, one_third_length),
     c(one_third_length + 1, one_third_length, one_third_length + 1)
  ))

  #Step 2
  x <- sort(x)
  y <- sort(y)
  
  #Step 3
  median_x <- tapply(x, groups, median)                                 
  median_y <- tapply(y, groups, median)

  #Step 4
  slope <- (median_y[3] - median_y[1]) / (median_x[3] - median_x[1])
  intercept <- median_y[1] - slope * median_x[1]

  #Step 5
  middle_prediction <- intercept + slope * median_x[2]
  intercept <- intercept + (median_y[2] - middle_prediction) / 3
  c(intercept = unname(intercept), slope = unname(slope))
}

为了测试它，这里有一个示例：

dfr <- data.frame(
  time = c(.16, .24, .25, .30, .30, .32, .36, .36, .50, .50, .57, .61, .61, .68, .72, .72, .83, .88, .89),
  distance = c(12.1, 29.8, 32.7, 42.8, 44.2, 55.8, 63.5, 65.1, 124.6, 129.7, 150.2, 182.2, 189.4, 220.4, 250.4, 261.0, 334.5, 375.5, 399.1))
  
median_median_line(time, distance, dfr) 
#intercept     slope 
#   -113.6     520.0

请注意指定组的稍微奇怪的方式。这些说明对于如何定义组大小非常挑剔，因此更明显的方法 cut(x, quantile(x, seq.int(0, 1, 1/3))) 不会工作。

There's a description of how to calculate the median-median line here. An R implementation of that is

median_median_line <- function(x, y, data)
{
  if(!missing(data))
  {
    x <- eval(substitute(x), data) 
    y <- eval(substitute(y), data) 
  }
  
  stopifnot(length(x) == length(y))

  #Step 1
  one_third_length <- floor(length(x) / 3)
  groups <- rep(1:3, times = switch((length(x) %% 3) + 1,
     one_third_length,
     c(one_third_length, one_third_length + 1, one_third_length),
     c(one_third_length + 1, one_third_length, one_third_length + 1)
  ))

  #Step 2
  x <- sort(x)
  y <- sort(y)
  
  #Step 3
  median_x <- tapply(x, groups, median)                                 
  median_y <- tapply(y, groups, median)

  #Step 4
  slope <- (median_y[3] - median_y[1]) / (median_x[3] - median_x[1])
  intercept <- median_y[1] - slope * median_x[1]

  #Step 5
  middle_prediction <- intercept + slope * median_x[2]
  intercept <- intercept + (median_y[2] - middle_prediction) / 3
  c(intercept = unname(intercept), slope = unname(slope))
}

To test it, here's an example:

dfr <- data.frame(
  time = c(.16, .24, .25, .30, .30, .32, .36, .36, .50, .50, .57, .61, .61, .68, .72, .72, .83, .88, .89),
  distance = c(12.1, 29.8, 32.7, 42.8, 44.2, 55.8, 63.5, 65.1, 124.6, 129.7, 150.2, 182.2, 189.4, 220.4, 250.4, 261.0, 334.5, 375.5, 399.1))
  
median_median_line(time, distance, dfr) 
#intercept     slope 
#   -113.6     520.0

Note the slightly odd way of specifying the groups. The instructions are quite picky about how you define group sizes, so the more obvious method of cut(x, quantile(x, seq.int(0, 1, 1/3))) doesn't work.

回复收藏 0 原文