F分布的密度图(DF1 = 1)。理论还是模拟?
我正在绘制 r 中F(1,49)的密度。似乎模拟图确实不匹配当值接近零时的理论图。
set.seed(123)
val <- rf(1000, df1=1, df2=49)
plot(density(val), yaxt="n",ylab="",xlab="Observation",
main=expression(paste("Density plot (",italic(n),"=1000, ",italic(df)[1],"=1, ",italic(df)[2],"=49)")),
lwd=2)
curve(df(x, df1=1, df2=49), from=0, to=10, add=T, col="red",lwd=2,lty=2)
legend("topright",c("Theoretical","Simulated"),
col=c("red","black"),lty=c(2,1),bty="n")
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
使用
密度(Val,来自= 0)
使您更加接近,尽管仍然不完美。众所周知,几乎边界的密度很难以令人满意的方式计算。Using
density(val, from = 0)
gets you much closer, although still not perfect. Densities near boundaries are notoriously difficult to calculate in a satisfactory way.默认情况下,
密度
使用高斯内核来估计给定点处的概率密度。有效地,这意味着在每个点都发现观察结果,将正常的密度曲线放置在其中心。所有这些正常密度都加了,然后将结果标准化,以使曲线下的面积为1。如果观测值具有中心趋势,则可以很好地效果,但是当有急剧的边界时会产生不切实际的结果(尝试
plot(密度密度) (runif(1000)))
在一个典型示例中)。当您的点非常高的点接近零,而没有低于零的零,所有正常核的左尾将“溢出”到负值中,从而使高斯型不匹配理论密度。
这意味着,如果您在0处具有尖锐的边界,则应删除平滑内核的零和大约两个标准偏差之间的模拟密度值 - 以下任何内容都会误导。
由于我们可以通过
bw
密度
的参数来控制平滑内核的标准偏差,并可以轻松地控制使用ggplot
,通过做类似的事情,我们将获得更明智的结果:By default,
density
uses a Gaussian kernel to estimate the probability density at a given point. Effectively, this means that at each point an observation was found, a normal density curve is placed there with its center at the observation. All these normal densities are added up, then the result is normalized so that the area under the curve is 1.This works well if observations have a central tendency, but gives unrealistic results when there are sharp boundaries (Try
plot(density(runif(1000)))
for a prime example).When you have a very high density of points close to zero, but none below zero, the left tail of all the normal kernels will "spill over" into the negative values, giving a Gaussian-type which doesn't match the theoretical density.
This means that if you have a sharp boundary at 0, you should remove values of your simulated density that are between zero and about two standard deviations of your smoothing kernel - anything below this will be misleading.
Since we can control the standard deviation of our smoothing kernel with the
bw
parameter ofdensity
, and easily control which x values are plotted usingggplot
, we will get a more sensible result by doing something like this:kde1d 和 logspline 包装对这种密度不利。
The kde1d and logspline packages are not bad for such densities.