F分布的密度图(DF1 = 1)。理论还是模拟?

发布于 2025-01-30 09:40:56 字数 656 浏览 6 评论 0 原文

我正在绘制 r 中F(1,49)的密度。似乎模拟图确实匹配当值接近零时的理论图。

set.seed(123)
val <- rf(1000, df1=1, df2=49)
plot(density(val), yaxt="n",ylab="",xlab="Observation",
     main=expression(paste("Density plot (",italic(n),"=1000, ",italic(df)[1],"=1, ",italic(df)[2],"=49)")),
     lwd=2)
curve(df(x, df1=1, df2=49), from=0, to=10, add=T, col="red",lwd=2,lty=2)
legend("topright",c("Theoretical","Simulated"),
       col=c("red","black"),lty=c(2,1),bty="n")

I am plotting the density of F(1,49) in R. It seems that the simulated plot does not match the theoretical plot when values approach the zero. enter image description here

set.seed(123)
val <- rf(1000, df1=1, df2=49)
plot(density(val), yaxt="n",ylab="",xlab="Observation",
     main=expression(paste("Density plot (",italic(n),"=1000, ",italic(df)[1],"=1, ",italic(df)[2],"=49)")),
     lwd=2)
curve(df(x, df1=1, df2=49), from=0, to=10, add=T, col="red",lwd=2,lty=2)
legend("topright",c("Theoretical","Simulated"),
       col=c("red","black"),lty=c(2,1),bty="n")

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

小兔几 2025-02-06 09:40:56

使用密度(Val,来自= 0)使您更加接近,尽管仍然不完美。众所周知,几乎边界的密度很难以令人满意的方式计算。

Using density(val, from = 0) gets you much closer, although still not perfect. Densities near boundaries are notoriously difficult to calculate in a satisfactory way.

enter image description here

吃颗糖壮壮胆 2025-02-06 09:40:56

默认情况下,密度使用高斯内核来估计给定点处的概率密度。有效地,这意味着在每个点都发现观察结果,将正常的密度曲线放置在其中心。所有这些正常密度都加了,然后将结果标准化,以使曲线下的面积为1。

如果观测值具有中心趋势,则可以很好地效果,但是当有急剧的边界时会产生不切实际的结果(尝试 plot(密度密度) (runif(1000)))在一个典型示例中)。

当您的点非常高的点接近零,而没有低于零的零,所有正常核的左尾将“溢出”到负值中,从而使高斯型不匹配理论密度。

这意味着,如果您在0处具有尖锐的边界,则应删除平滑内核的零和大约两个标准偏差之间的模拟密度值 - 以下任何内容都会误导。

由于我们可以通过 bw 密度的参数来控制平滑内核的标准偏差,并可以轻松地控制使用 ggplot ,通过做类似的事情,我们将获得更明智的结果:

library(ggplot2)

ggplot(as.data.frame(density(val), bw = 0.1), aes(x, y)) + 
  geom_line(aes(col = "Simulated"), na.rm = TRUE) + 
  geom_function(fun = ~ df(.x, df1 = 1, df2 = 49), 
                aes(col = "Theoretical"), lty = 2) +
  lims(x = c(0.2, 12)) +
  theme_classic(base_size = 16) +
  labs(title = expression(paste("Density plot (",italic(n),"=1000, ",
                                italic(df)[1],"=1, ",italic(df)[2],"=49)")),
       x = "Observation", y = "") +
  scale_color_manual(values = c("black", "red"), name = "")

”在此处输入图像说明”

By default, density uses a Gaussian kernel to estimate the probability density at a given point. Effectively, this means that at each point an observation was found, a normal density curve is placed there with its center at the observation. All these normal densities are added up, then the result is normalized so that the area under the curve is 1.

This works well if observations have a central tendency, but gives unrealistic results when there are sharp boundaries (Try plot(density(runif(1000))) for a prime example).

When you have a very high density of points close to zero, but none below zero, the left tail of all the normal kernels will "spill over" into the negative values, giving a Gaussian-type which doesn't match the theoretical density.

This means that if you have a sharp boundary at 0, you should remove values of your simulated density that are between zero and about two standard deviations of your smoothing kernel - anything below this will be misleading.

Since we can control the standard deviation of our smoothing kernel with the bw parameter of density, and easily control which x values are plotted using ggplot, we will get a more sensible result by doing something like this:

library(ggplot2)

ggplot(as.data.frame(density(val), bw = 0.1), aes(x, y)) + 
  geom_line(aes(col = "Simulated"), na.rm = TRUE) + 
  geom_function(fun = ~ df(.x, df1 = 1, df2 = 49), 
                aes(col = "Theoretical"), lty = 2) +
  lims(x = c(0.2, 12)) +
  theme_classic(base_size = 16) +
  labs(title = expression(paste("Density plot (",italic(n),"=1000, ",
                                italic(df)[1],"=1, ",italic(df)[2],"=49)")),
       x = "Observation", y = "") +
  scale_color_manual(values = c("black", "red"), name = "")

enter image description here

那片花海 2025-02-06 09:40:56

kde1d logspline 包装对这种密度不利。

sims <- rf(1500, 1, 49)

library(kde1d)
kd <- kde1d(sims, bw = 1, xmin = 0)
plot(kd, col = "red", xlim = c(0, 2), ylim = c(0, 2))
curve(df(x, 1, 49), add = TRUE)

library(logspline)
fit <- logspline(sims, lbound = 0, knots = c(0, 0.5, 1, 1.5, 2))
plot(fit, col = "red", xlim = c(0, 2), ylim = c(0, 2))
curve(df(x, 1, 49), add = TRUE)

”在此处输入图像说明”

The kde1d and logspline packages are not bad for such densities.

sims <- rf(1500, 1, 49)

library(kde1d)
kd <- kde1d(sims, bw = 1, xmin = 0)
plot(kd, col = "red", xlim = c(0, 2), ylim = c(0, 2))
curve(df(x, 1, 49), add = TRUE)

library(logspline)
fit <- logspline(sims, lbound = 0, knots = c(0, 0.5, 1, 1.5, 2))
plot(fit, col = "red", xlim = c(0, 2), ylim = c(0, 2))
curve(df(x, 1, 49), add = TRUE)

enter image description here

enter image description here

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文