约翰·图基“中位数” R(或“阻力线”)统计检验和线性回归

发布于 2024-09-09 04:26:02 字数 465 浏览 10 评论 0原文

我正在搜索 John Tukey 算法,该算法在使用 R 的线性回归中计算“阻力线”或“中线-中线”。

邮件列表上的一名学生用以下术语解释了该算法:

“计算方式是除以 将数据分成三组,找出 x 中值和 y 中值(称为 每个组的总结点),以及 然后使用这三个总结点 确定线路。外面两个 总结点决定斜率, 以及所有这些的平均值 确定截距。”

关于 John tukey 的中位数的文章出于好奇: http://www.johndcook.com/blog/2009/06/23/tukey-median-ninther/

你知道我在哪里可以找到这个算法或 R 函数吗? 多谢 !

I'm searching the John Tukey algorithm which compute a "resistant line" or "median-median line" on my linear regression with R.

A student on a mailling list explain this algorithm in these terms :

"The way it's calculated is to divide
the data into three groups, find the
x-median and y-median values (called
the summary point) for each group, and
then use those three summary points to
determine the line. The outer two
summary points determine the slope,
and an average of all of them
determines the intercept."

Article about John tukey's median median for curious : http://www.johndcook.com/blog/2009/06/23/tukey-median-ninther/

Do you have an idea of where i could find this algorithm or R function ? In which packages,
Thanks a lot !

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

蓝颜夕 2024-09-16 04:26:02

有关于如何计算中位数线的描述

median_median_line <- function(x, y, data)
{
  if(!missing(data))
  {
    x <- eval(substitute(x), data) 
    y <- eval(substitute(y), data) 
  }
  
  stopifnot(length(x) == length(y))

  #Step 1
  one_third_length <- floor(length(x) / 3)
  groups <- rep(1:3, times = switch((length(x) %% 3) + 1,
     one_third_length,
     c(one_third_length, one_third_length + 1, one_third_length),
     c(one_third_length + 1, one_third_length, one_third_length + 1)
  ))

  #Step 2
  x <- sort(x)
  y <- sort(y)
  
  #Step 3
  median_x <- tapply(x, groups, median)                                 
  median_y <- tapply(y, groups, median)

  #Step 4
  slope <- (median_y[3] - median_y[1]) / (median_x[3] - median_x[1])
  intercept <- median_y[1] - slope * median_x[1]

  #Step 5
  middle_prediction <- intercept + slope * median_x[2]
  intercept <- intercept + (median_y[2] - middle_prediction) / 3
  c(intercept = unname(intercept), slope = unname(slope))
}

为了测试它,这里有一个示例:

dfr <- data.frame(
  time = c(.16, .24, .25, .30, .30, .32, .36, .36, .50, .50, .57, .61, .61, .68, .72, .72, .83, .88, .89),
  distance = c(12.1, 29.8, 32.7, 42.8, 44.2, 55.8, 63.5, 65.1, 124.6, 129.7, 150.2, 182.2, 189.4, 220.4, 250.4, 261.0, 334.5, 375.5, 399.1))
  
median_median_line(time, distance, dfr) 
#intercept     slope 
#   -113.6     520.0

请注意指定组的稍微奇怪的方式。这些说明对于如何定义组大小非常挑剔,因此更明显的方法 cut(x, quantile(x, seq.int(0, 1, 1/3))) 不会工作。

There's a description of how to calculate the median-median line here. An R implementation of that is

median_median_line <- function(x, y, data)
{
  if(!missing(data))
  {
    x <- eval(substitute(x), data) 
    y <- eval(substitute(y), data) 
  }
  
  stopifnot(length(x) == length(y))

  #Step 1
  one_third_length <- floor(length(x) / 3)
  groups <- rep(1:3, times = switch((length(x) %% 3) + 1,
     one_third_length,
     c(one_third_length, one_third_length + 1, one_third_length),
     c(one_third_length + 1, one_third_length, one_third_length + 1)
  ))

  #Step 2
  x <- sort(x)
  y <- sort(y)
  
  #Step 3
  median_x <- tapply(x, groups, median)                                 
  median_y <- tapply(y, groups, median)

  #Step 4
  slope <- (median_y[3] - median_y[1]) / (median_x[3] - median_x[1])
  intercept <- median_y[1] - slope * median_x[1]

  #Step 5
  middle_prediction <- intercept + slope * median_x[2]
  intercept <- intercept + (median_y[2] - middle_prediction) / 3
  c(intercept = unname(intercept), slope = unname(slope))
}

To test it, here's an example:

dfr <- data.frame(
  time = c(.16, .24, .25, .30, .30, .32, .36, .36, .50, .50, .57, .61, .61, .68, .72, .72, .83, .88, .89),
  distance = c(12.1, 29.8, 32.7, 42.8, 44.2, 55.8, 63.5, 65.1, 124.6, 129.7, 150.2, 182.2, 189.4, 220.4, 250.4, 261.0, 334.5, 375.5, 399.1))
  
median_median_line(time, distance, dfr) 
#intercept     slope 
#   -113.6     520.0

Note the slightly odd way of specifying the groups. The instructions are quite picky about how you define group sizes, so the more obvious method of cut(x, quantile(x, seq.int(0, 1, 1/3))) doesn't work.

我早已燃尽 2024-09-16 04:26:02

我来晚了一点,但是你尝试过 stats 包中的 line() 吗?

来自帮助文件:

Value

类“tukeyline”的对象。

参考文献

Tukey, JW (1977)。探索性数据分析,阅读马萨诸塞州:Addison-Wesley。

I'm a little late to the party, but have you tried line() from the stats package?

From the helpfile:

Value

An object of class "tukeyline".

References

Tukey, J. W. (1977). Exploratory Data Analysis, Reading Massachusetts: Addison-Wesley.

郁金香雨 2024-09-16 04:26:02

作为 R Core 团队的一员,我现在已经深入研究了源代码,也研究了它的历史。

结论:源 C 源代码于 19961997 年添加,当时 R 仍称为 alpha(大约版本 0.14alpha)已经计算出不太正确的分位数......对于某些样本大小。

有关此内容的更多信息,请参见 R 邮件列表(尚未)。

As member of the R Core team, I now have digged in the source code, and also studied the history of it.

Conclusion: The source C source code, added in 19961997, when R was still called alpha (and around version 0.14alpha) already computed the quantiles not quite correctly... for some sample sizes.

More about this on the R mailing lists (not yet).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文