r cdplot（） - 右轴是否显示概率或密度？

发布于 2025-01-27 22:44:53 字数 667 浏览 5 评论 0原文

可重复的数据：

## NASA space shuttle o-ring failures
fail <- factor(c(2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1,
                 1, 2, 1, 1, 1, 1, 1),
               levels = 1:2, labels = c("no", "yes"))
temperature <- c(53, 57, 58, 63, 66, 67, 67, 67, 68, 69, 70, 70,
                 70, 70, 72, 73, 75, 75, 76, 76, 78, 79, 81)

## CD plot
cdplot(fail ~ temperature)

CDPLOT的文档说：

CDPLOT计算X的条件密度，鉴于Y的边缘分布的加权水平。 这些密度是在y的水平上累积的。 条件概率不是通过离散化（如自旋图中的）而而是通过密度使用平滑方法来得出的。强>可见。

因此，在x = 63的图上，y = 0.4（大约）。这概率还是概率密度？关于计算的内容，返回的内容以及绘制的内容，我感到困惑。

原文

Reproducible data:

## NASA space shuttle o-ring failures
fail <- factor(c(2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1,
                 1, 2, 1, 1, 1, 1, 1),
               levels = 1:2, labels = c("no", "yes"))
temperature <- c(53, 57, 58, 63, 66, 67, 67, 67, 68, 69, 70, 70,
                 70, 70, 72, 73, 75, 75, 76, 76, 78, 79, 81)

## CD plot
cdplot(fail ~ temperature)

The documentation for cdplot says:

cdplot computes the conditional densities of x given the levels of y weighted by the marginal distribution of y. The densities are derived cumulatively over the levels of y. The conditional probabilities are not derived by discretization (as in the spinogram), but using a smoothing approach via density.The conditional density functions (cumulative over the levels of y) are returned invisibly.

So on the plot where x = 63, y = 0.4 (approximately). Is this probability, or probability density? I am confused by the documentation as to what is calculated, what is returned and what is plotted.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

ま昔日黯然 2025-02-03 22:44:53

该图显示了给定温度的结果的概率。

文档所说的是，为温度测量计算标准密度分布，当失败为'no'时，温度分别处理密度。如果我们将“否”温度的密度除以所有温度的密度，然后以“否”温度的比例加权，那么我们将获得在给定温度下绘制“否”的概率的估计。

为了证明这种情况，让我们看看CDPLOT：

cdplot(fail ~ temperature)

我们应该在曲线上获得几乎相同的形状

all <- density(temperature, from = min(temperature), to = max(temperature))

no  <- density(temperature[fail == "no"], from = min(temperature), 
                 to = max(temperature))

probs <- no$y/all$y * proportions(table(fail))[1]

plot(all$x, 1 - probs, type = "l", ylim = c(0, 1))

The plot shows the probability of an outcome for a given temperature.

What the docs are saying is that a standard density distribution is calculated for temperature measurements, and a density is worked out separately for temperature when fail is 'no'. If we divide the density of "no" temperatures by the density of all temperatures, then weight this by the proportion of 'no' temperatures, then we will get an estimate of the probability of drawing a "no" at a given temperature.

To show this is the case, let's see the cdplot:

cdplot(fail ~ temperature)

Now let's calculate the probabilities from the marginal densities manually and plot. We should get a near-identical shape to our curve

all <- density(temperature, from = min(temperature), to = max(temperature))

no  <- density(temperature[fail == "no"], from = min(temperature), 
                 to = max(temperature))

probs <- no$y/all$y * proportions(table(fail))[1]

plot(all$x, 1 - probs, type = "l", ylim = c(0, 1))