为什么 hist() 函数没有区域一
在 R 中使用 hist()
并设置 freq=FALSE
时,我应该得到一个密度。然而,我不这么认为。我得到的数字不仅仅是显示计数时的其他数字。我还需要正常化。
例如:
> h = hist(c(1,2,1,3,1,4,5,4,5,8,2,4,1,7,6,10,7,4,3,7,3,5), freq=FALSE)
> h$density
0.13636364 0.15909091 0.09090909 0.09090909 0.02272727
> sum(h$density)
[1] 0.5
> h$density/sum(h$density)
[1] 0.27272727 0.31818182 0.18181818 0.18181818 0.0454545
When using hist()
in R and setting freq=FALSE
I should get a densities. However, I do not. I get other numbers than when it just shows the count. I still need to normalize.
For example:
> h = hist(c(1,2,1,3,1,4,5,4,5,8,2,4,1,7,6,10,7,4,3,7,3,5), freq=FALSE)
> h$density
0.13636364 0.15909091 0.09090909 0.09090909 0.02272727
> sum(h$density)
[1] 0.5
> h$density/sum(h$density)
[1] 0.27272727 0.31818182 0.18181818 0.18181818 0.0454545
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
如果您检查直方图输出的其余部分,您会注意到条形的长度为 2:
因此您应该将
sum(h$密度)
乘以 2 以使面积等于 1。如果您查看直方图,您可以清楚地看到这一点。If you examine the rest of the histogram output, you will notice that the bars have length 2:
Hence you should multiple the
sum(h$density)
by 2 to get the area equal to one. You can see this clearly if you look at the histogram.事实上,直方图的面积是
1.0
。您没有考虑到每个条形的宽度都是两个单位:The area of the histogram is, in fact,
1.0
. What you're not taking into account is that every bar is two units wide:密度与概率不同。直方图的密度是条形的高度。概率是条形的面积。您需要将高度乘以宽度才能得到面积。尝试
一下,因为
breaks
包含每个 bin 的起点/终点。因此,通过计算每个值之间的差值,您可以得到垃圾箱的总宽度。您还可以with()
更轻松地获取这两个值。The density is not the same as the probability. The density for a histogram is the height of the bar. The probability is the area of the bar. You need to multiply the height times with width to get the area. Try
The works because
breaks
contains the start/end points for each of the bins. So by taking the difference between each value, you get the total width of the bin. You can alsowith()
to more easily grab both of those values.