为什么 hist() 函数没有区域一

发布于 2024-12-10 10:46:35 字数 419 浏览 0 评论 0原文

在 R 中使用 hist() 并设置 freq=FALSE 时,我应该得到一个密度。然而,我不这么认为。我得到的数字不仅仅是显示计数时的其他数字。我还需要正常化。

例如:

> h = hist(c(1,2,1,3,1,4,5,4,5,8,2,4,1,7,6,10,7,4,3,7,3,5), freq=FALSE)
> h$density
  0.13636364 0.15909091 0.09090909 0.09090909 0.02272727
> sum(h$density)
  [1] 0.5
> h$density/sum(h$density)
  [1] 0.27272727 0.31818182 0.18181818 0.18181818 0.0454545

When using hist() in R and setting freq=FALSE I should get a densities. However, I do not. I get other numbers than when it just shows the count. I still need to normalize.

For example:

> h = hist(c(1,2,1,3,1,4,5,4,5,8,2,4,1,7,6,10,7,4,3,7,3,5), freq=FALSE)
> h$density
  0.13636364 0.15909091 0.09090909 0.09090909 0.02272727
> sum(h$density)
  [1] 0.5
> h$density/sum(h$density)
  [1] 0.27272727 0.31818182 0.18181818 0.18181818 0.0454545

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

提赋 2024-12-17 10:46:35

如果您检查直方图输出的其余部分,您会注意到条形的长度为 2:

$breaks
[1]  0  2  4  6  8 10

因此您应该将 sum(h$密度) 乘以 2 以使面积等于 1。如果您查看直方图,您可以清楚地看到这一点。

输入图片此处描述

If you examine the rest of the histogram output, you will notice that the bars have length 2:

$breaks
[1]  0  2  4  6  8 10

Hence you should multiple the sum(h$density) by 2 to get the area equal to one. You can see this clearly if you look at the histogram.

enter image description here

堇色安年 2024-12-17 10:46:35

事实上,直方图的面积是1.0。您没有考虑到每个条形的宽度都是两个单位:

> h$breaks
[1]  0  2  4  6  8 10

The area of the histogram is, in fact, 1.0. What you're not taking into account is that every bar is two units wide:

> h$breaks
[1]  0  2  4  6  8 10
迷途知返 2024-12-17 10:46:35
sum(h$density*(h$breaks[-1] - h$breaks[-length(h$breaks)]))

[1] 1
sum(h$density*(h$breaks[-1] - h$breaks[-length(h$breaks)]))

[1] 1
往日 2024-12-17 10:46:35

密度与概率不同。直方图的密度是条形的高度。概率是条形的面积。您需要将高度乘以宽度才能得到面积。尝试

x <- c(1,2,1,3,1,4,5,4,5,8,2,4,1,7,6,10,7,4,3,7,3,5)
hh <- hist(x, probability = TRUE)
sum(diff(hh$breaks) * hh$density)
# [1] 1

一下,因为 breaks 包含每个 bin 的起点/终点。因此,通过计算每个值之间的差值,您可以得到垃圾箱的总宽度。您还可以 with() 更轻松地获取这两个值。

x <- c(1,2,1,3,1,4,5,4,5,8,2,4,1,7,6,10,7,4,3,7,3,5)
with(hist(x, probability = TRUE), sum(diff(breaks) * density))
# [1] 1

The density is not the same as the probability. The density for a histogram is the height of the bar. The probability is the area of the bar. You need to multiply the height times with width to get the area. Try

x <- c(1,2,1,3,1,4,5,4,5,8,2,4,1,7,6,10,7,4,3,7,3,5)
hh <- hist(x, probability = TRUE)
sum(diff(hh$breaks) * hh$density)
# [1] 1

The works because breaks contains the start/end points for each of the bins. So by taking the difference between each value, you get the total width of the bin. You can also with() to more easily grab both of those values.

x <- c(1,2,1,3,1,4,5,4,5,8,2,4,1,7,6,10,7,4,3,7,3,5)
with(hist(x, probability = TRUE), sum(diff(breaks) * density))
# [1] 1
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文