F分布的密度图（DF1 = 1）。理论还是模拟？

发布于 2025-01-30 09:40:56 字数 656 浏览 6 评论 0 原文

我正在绘制 r 中F（1,49）的密度。似乎模拟图确实不匹配当值接近零时的理论图。

set.seed(123)
val <- rf(1000, df1=1, df2=49)
plot(density(val), yaxt="n",ylab="",xlab="Observation",
     main=expression(paste("Density plot (",italic(n),"=1000, ",italic(df)[1],"=1, ",italic(df)[2],"=49)")),
     lwd=2)
curve(df(x, df1=1, df2=49), from=0, to=10, add=T, col="red",lwd=2,lty=2)
legend("topright",c("Theoretical","Simulated"),
       col=c("red","black"),lty=c(2,1),bty="n")

原文

I am plotting the density of F(1,49) in R. It seems that the simulated plot does not match the theoretical plot when values approach the zero.

set.seed(123)
val <- rf(1000, df1=1, df2=49)
plot(density(val), yaxt="n",ylab="",xlab="Observation",
     main=expression(paste("Density plot (",italic(n),"=1000, ",italic(df)[1],"=1, ",italic(df)[2],"=49)")),
     lwd=2)
curve(df(x, df1=1, df2=49), from=0, to=10, add=T, col="red",lwd=2,lty=2)
legend("topright",c("Theoretical","Simulated"),
       col=c("red","black"),lty=c(2,1),bty="n")

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

小兔几 2025-02-06 09:40:56

使用密度（Val，来自= 0）使您更加接近，尽管仍然不完美。众所周知，几乎边界的密度很难以令人满意的方式计算。

回复收藏 0 原文

吃颗糖壮壮胆 2025-02-06 09:40:56

默认情况下，密度使用高斯内核来估计给定点处的概率密度。有效地，这意味着在每个点都发现观察结果，将正常的密度曲线放置在其中心。所有这些正常密度都加了，然后将结果标准化，以使曲线下的面积为1。

如果观测值具有中心趋势，则可以很好地效果，但是当有急剧的边界时会产生不切实际的结果（尝试 plot（密度密度）（runif（1000）））在一个典型示例中）。

当您的点非常高的点接近零，而没有低于零的零，所有正常核的左尾将“溢出”到负值中，从而使高斯型不匹配理论密度。

这意味着，如果您在0处具有尖锐的边界，则应删除平滑内核的零和大约两个标准偏差之间的模拟密度值 - 以下任何内容都会误导。

由于我们可以通过 bw 密度的参数来控制平滑内核的标准偏差，并可以轻松地控制使用 ggplot ，通过做类似的事情，我们将获得更明智的结果：

library(ggplot2)

ggplot(as.data.frame(density(val), bw = 0.1), aes(x, y)) + 
  geom_line(aes(col = "Simulated"), na.rm = TRUE) + 
  geom_function(fun = ~ df(.x, df1 = 1, df2 = 49), 
                aes(col = "Theoretical"), lty = 2) +
  lims(x = c(0.2, 12)) +
  theme_classic(base_size = 16) +
  labs(title = expression(paste("Density plot (",italic(n),"=1000, ",
                                italic(df)[1],"=1, ",italic(df)[2],"=49)")),
       x = "Observation", y = "") +
  scale_color_manual(values = c("black", "red"), name = "")

By default, density uses a Gaussian kernel to estimate the probability density at a given point. Effectively, this means that at each point an observation was found, a normal density curve is placed there with its center at the observation. All these normal densities are added up, then the result is normalized so that the area under the curve is 1.

This works well if observations have a central tendency, but gives unrealistic results when there are sharp boundaries (Try plot(density(runif(1000))) for a prime example).

When you have a very high density of points close to zero, but none below zero, the left tail of all the normal kernels will "spill over" into the negative values, giving a Gaussian-type which doesn't match the theoretical density.

This means that if you have a sharp boundary at 0, you should remove values of your simulated density that are between zero and about two standard deviations of your smoothing kernel - anything below this will be misleading.

Since we can control the standard deviation of our smoothing kernel with the bw parameter of density, and easily control which x values are plotted using ggplot, we will get a more sensible result by doing something like this:

library(ggplot2)

ggplot(as.data.frame(density(val), bw = 0.1), aes(x, y)) + 
  geom_line(aes(col = "Simulated"), na.rm = TRUE) + 
  geom_function(fun = ~ df(.x, df1 = 1, df2 = 49), 
                aes(col = "Theoretical"), lty = 2) +
  lims(x = c(0.2, 12)) +
  theme_classic(base_size = 16) +
  labs(title = expression(paste("Density plot (",italic(n),"=1000, ",
                                italic(df)[1],"=1, ",italic(df)[2],"=49)")),
       x = "Observation", y = "") +
  scale_color_manual(values = c("black", "red"), name = "")

回复收藏 0 原文

那片花海 2025-02-06 09:40:56

kde1d 和 logspline 包装对这种密度不利。

sims <- rf(1500, 1, 49)

library(kde1d)
kd <- kde1d(sims, bw = 1, xmin = 0)
plot(kd, col = "red", xlim = c(0, 2), ylim = c(0, 2))
curve(df(x, 1, 49), add = TRUE)

library(logspline)
fit <- logspline(sims, lbound = 0, knots = c(0, 0.5, 1, 1.5, 2))
plot(fit, col = "red", xlim = c(0, 2), ylim = c(0, 2))
curve(df(x, 1, 49), add = TRUE)

The kde1d and logspline packages are not bad for such densities.

sims <- rf(1500, 1, 49)

library(kde1d)
kd <- kde1d(sims, bw = 1, xmin = 0)
plot(kd, col = "red", xlim = c(0, 2), ylim = c(0, 2))
curve(df(x, 1, 49), add = TRUE)

library(logspline)
fit <- logspline(sims, lbound = 0, knots = c(0, 0.5, 1, 1.5, 2))
plot(fit, col = "red", xlim = c(0, 2), ylim = c(0, 2))
curve(df(x, 1, 49), add = TRUE)