重新创建 minitab 正态概率图

发布于 2024-09-27 17:02:52 字数 2034 浏览 4 评论 0原文

我正在尝试使用 R 重新创建以下图。Minitab 将其描述为正态概率图。

alt text

probplot 可以帮助您完成大部分工作。不幸的是,我无法弄清楚如何在该图周围添加置信区间带。

同样,ggplot 的 stat_qq() 似乎通过转换的 x 轴呈现类似的信息。似乎 geom_smooth() 可能是添加频段的候选者,但我还没有弄清楚。

最后,完成遗传学的人描述了类似的东西 这里。

重新创建上面的图的示例数据:

x <- c(40.2, 43.1, 45.5, 44.5, 39.5, 38.5, 40.2, 41.0, 41.6, 43.1, 44.9, 42.8)

如果有人有基本图形或 ggplot 的解决方案,我将不胜感激!

编辑

在查看了probplot的详细信息后,我确定这就是它在图表上生成拟合线的方式:

> xl <- quantile(x, c(0.25, 0.75))
> yl <- qnorm(c(0.25, 0.75))
> slope <- diff(yl)/diff(xl)
> int <- yl[1] - slope * xl[1]
> slope
   75% 
0.4151 
> int
   75% 
-17.36 

确实,将这些结果与您得到的结果进行比较probplot 对象似乎比较得很好:

> check <- probplot(x)
> str(check)
List of 3
 $ qdist:function (p)  
 $ int  : Named num -17.4
  ..- attr(*, "names")= chr "75%"
 $ slope: Named num 0.415
  ..- attr(*, "names")= chr "75%"
 - attr(*, "class")= chr "probplot"
> 

但是,将此信息合并到 ggplot2 或基础图形中不会产生相同的结果。

probplot(x)

alt text

与:

ggplot(data = df, aes(x = x, y = y)) + geom_point() + geom_abline(intercept = int, slope = slope)

 alt text

我使用 R 的基本图形得到了类似的结果

plot(df$x, df$y)
abline(int, slope, col = "red")

最后,我了解到图例的最后两行引用了 Anderson-Darling 正态性测试,并且可以使用 nortest 包进行重现。

> ad.test(x)

    Anderson-Darling normality test

data:  x 
A = 0.2303, p-value = 0.7502

I am trying to recreate the following plot with R. Minitab describes this as a normal probability plot.

alt text

The probplot gets you most of the way there. Unfortunately, I cannot figure out how to add the confidence interval bands around this plot.

Similarly, ggplot's stat_qq() seems to present similar information with a transformed x axis. It seems that geom_smooth() would be the likely candidate to add the bands, but I haven't figure that out.

Finally, the Getting Genetics Done guys describe something similar here.

Sample data to recreate the plot above:

x <- c(40.2, 43.1, 45.5, 44.5, 39.5, 38.5, 40.2, 41.0, 41.6, 43.1, 44.9, 42.8)

If anyone has a solution with base graphics or ggplot, I'd appreciate it!

EDIT

After looking at the details of probplot, I've determined this is how it generates the fit line on the graph:

> xl <- quantile(x, c(0.25, 0.75))
> yl <- qnorm(c(0.25, 0.75))
> slope <- diff(yl)/diff(xl)
> int <- yl[1] - slope * xl[1]
> slope
   75% 
0.4151 
> int
   75% 
-17.36 

Indeed, comparing these results to what you get out of the probplot object seem to compare very well:

> check <- probplot(x)
> str(check)
List of 3
 $ qdist:function (p)  
 $ int  : Named num -17.4
  ..- attr(*, "names")= chr "75%"
 $ slope: Named num 0.415
  ..- attr(*, "names")= chr "75%"
 - attr(*, "class")= chr "probplot"
> 

However, incorporating this information into ggplot2 or base graphics does not yield the same results.

probplot(x)

alt text

Versus:

ggplot(data = df, aes(x = x, y = y)) + geom_point() + geom_abline(intercept = int, slope = slope)

alt text

I get similar results using R's base graphics

plot(df$x, df$y)
abline(int, slope, col = "red")

Lastly, I've learned that the last two rows of the legend refer to the Anderson-Darling test for normality and can be reproduced with the nortest package.

> ad.test(x)

    Anderson-Darling normality test

data:  x 
A = 0.2303, p-value = 0.7502

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

爱给你人给你 2024-10-04 17:02:52

尝试使用 QTLRel 包中的 qqPlot 函数。

require("QTLRel")
qqPlot(rnorm(100))

在此处输入图像描述

Try the qqPlot function in the QTLRel package.

require("QTLRel")
qqPlot(rnorm(100))

enter image description here

許願樹丅啲祈禱 2024-10-04 17:02:52

也许这将是你可以借鉴的东西。默认情况下,stat_smooth() 使用 level=0.95。

df <- data.frame(sort(x), ppoints(x))
colnames(df) <- c("x","y")

ggplot(df, aes(x,y)) + 
geom_point() + 
stat_smooth() + 
scale_y_continuous(limits=c(0,1),breaks=seq(from=0.05,to=1,by=0.05), formatter="percent")

Perhaps this will be something you can build on. By default, stat_smooth() uses level=0.95.

df <- data.frame(sort(x), ppoints(x))
colnames(df) <- c("x","y")

ggplot(df, aes(x,y)) + 
geom_point() + 
stat_smooth() + 
scale_y_continuous(limits=c(0,1),breaks=seq(from=0.05,to=1,by=0.05), formatter="percent")
零時差 2024-10-04 17:02:52

您使用了不正确的“y”,它们应该是分位数(用概率标记)。下面显示了正确位置的线:

df<-data.frame(x=sort(x),y=qnorm(ppoints(length(x)))) 
probs <- c(0.01, 0.05, seq(0.1, 0.9, by = 0.1), 0.95, 0.99)
qprobs<-qnorm(probs)

xl <- quantile(x, c(0.25, 0.75))
yl <-  qnorm(c(0.25, 0.75))
slope <- diff(yl)/diff(xl)
int <- yl[1] - slope * xl[1]
ggplot(data = df, aes(x = x, y = y)) + geom_point() + geom_abline(intercept = int,slope = slope)+scale_y_continuous(limits=range(qprobs), breaks=qprobs, labels = 100*probs)+labs(y ="Percent" , x="Data")

要像在 Minitab 中一样添加置信界限,您可以执行以下操作

fd<-fitdistr(x, "normal") #Maximum-likelihood Fitting of Univariate Dist from MASS 
xp_hat<-fd$estimate[1]+qprobs*fd$estimate[2]  #estimated perc. for the fitted normal
v_xp_hat<- fd$sd[1]^2+qprobs^2*fd$sd[2]^2+2*qprobs*fd$vcov[1,2] #var. of estimated perc
xpl<-xp_hat + qnorm(0.025)*sqrt(v_xp_hat)  #lower bound
xpu<-xp_hat + qnorm(0.975)*sqrt(v_xp_hat)  #upper bound

df.bound<-data.frame(xp=xp_hat,xpl=xpl, xpu = xpu,nquant=qprobs)

,并将以下两条线从上面添加到您的 ggplot 中(此外,用估计的百分位数替换斜率和截距线方法) )

geom_line(data=df.bound,aes(x = xp, y = qprobs))+
geom_line(data=df.bound,aes(x = xpl, y = qprobs))+
geom_line(data=df.bound,aes(x = xpu, y = qprobs))

you are using the incorrect "y", they should be quantiles (labeled with probabilities). The following shows the line in the right spot:

df<-data.frame(x=sort(x),y=qnorm(ppoints(length(x)))) 
probs <- c(0.01, 0.05, seq(0.1, 0.9, by = 0.1), 0.95, 0.99)
qprobs<-qnorm(probs)

xl <- quantile(x, c(0.25, 0.75))
yl <-  qnorm(c(0.25, 0.75))
slope <- diff(yl)/diff(xl)
int <- yl[1] - slope * xl[1]
ggplot(data = df, aes(x = x, y = y)) + geom_point() + geom_abline(intercept = int,slope = slope)+scale_y_continuous(limits=range(qprobs), breaks=qprobs, labels = 100*probs)+labs(y ="Percent" , x="Data")

to add the confidence bounds as in Minitab, you can do the following

fd<-fitdistr(x, "normal") #Maximum-likelihood Fitting of Univariate Dist from MASS 
xp_hat<-fd$estimate[1]+qprobs*fd$estimate[2]  #estimated perc. for the fitted normal
v_xp_hat<- fd$sd[1]^2+qprobs^2*fd$sd[2]^2+2*qprobs*fd$vcov[1,2] #var. of estimated perc
xpl<-xp_hat + qnorm(0.025)*sqrt(v_xp_hat)  #lower bound
xpu<-xp_hat + qnorm(0.975)*sqrt(v_xp_hat)  #upper bound

df.bound<-data.frame(xp=xp_hat,xpl=xpl, xpu = xpu,nquant=qprobs)

and add the following two lines to your ggplot from above (in addition, replace the slope and intercept line approach with the estimated percentiles)

geom_line(data=df.bound,aes(x = xp, y = qprobs))+
geom_line(data=df.bound,aes(x = xpl, y = qprobs))+
geom_line(data=df.bound,aes(x = xpu, y = qprobs))
陌上芳菲 2024-10-04 17:02:52

我知道这是一个老问题,但对于仍在寻找解决方案的其他人来说,请查看 ggpubr 包中的 ggqqplot 。

library(ggpubr)
ggqqplot(data$sample)

ggqqplot of样本数据

I know it's an old question, but for others who also still look for a solution, have a look at ggqqplot from the ggpubr package.

library(ggpubr)
ggqqplot(data$sample)

ggqqplot of sample data

顾忌 2024-10-04 17:02:52

[这与上面朱莉B的回答有关]

https://stackoverflow.com/a/9215532/5885615

这个这是老话题了,但有人仍然想做某事(我最近做了)。
因此,我发现一个问题,显示 R 和 Minitab 之间的结果略有不同:QQ 图相似,但端点向外移动得更多。在深入研究代码后,我发现了差异:

函数“ppoints”用于按范围分布样本:

df<-data.frame(x=sort(x),y=qnorm(ppoints(length(x)))) 

在R中,它有下一个源代码:

function (n, a = if (n <= 10) 3/8 else 1/2)     # function"ppoints"
{
  if (length(n) > 1L) 
    n <- length(n)
  if (n > 0) 
    (1L:n - a)/(n + 1 - 2 * a)
  else numeric()
}

其中参数“a”取决于“n”,可以是3/8 或 1/2。

Minitab 对所有“n”使用 a = 0.3。

最明显的影响是在样品的端点上。

[It's related to the answer from Julie B: above]

https://stackoverflow.com/a/9215532/5885615

This is the old topic, but someone still can want to do something (I did it recently).
So I have found one issue showing a bit different results between R and Minitab: the QQ-plots are similar, but the end points are shifted more outside. After digging inside the code I have found the difference:

The function "ppoints" is used to distribute the sample by the range:

df<-data.frame(x=sort(x),y=qnorm(ppoints(length(x)))) 

In R it has the next source code:

function (n, a = if (n <= 10) 3/8 else 1/2)     # function"ppoints"
{
  if (length(n) > 1L) 
    n <- length(n)
  if (n > 0) 
    (1L:n - a)/(n + 1 - 2 * a)
  else numeric()
}

where the parameter "a", depending on "n", can be 3/8 or 1/2.

Minitab uses a = 0.3 for all "n".

The most visible effect is on the end points of the sample.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文