计算连续密度图下的面积

发布于 2024-09-26 13:08:40 字数 425 浏览 12 评论 0原文

我使用此绘制了两条密度曲线:

Network <- Mydf$Networks
quartiles <-  quantile(Mydf$Avg.Position,  probs=c(25,50,75)/100)
density <- ggplot(Mydf, aes(x = Avg.Position, fill = Network))
d <- density + geom_density(alpha = 0.2) + xlim(1,11) + opts(title = "September 2010") + geom_vline(xintercept = quartiles, colour = "red")
print(d)

我想计算给定 Avg.Position 范围的每条曲线下的面积。有点像正态曲线的 pnorm。有什么想法吗?

I have two density curves plotted using this:

Network <- Mydf$Networks
quartiles <-  quantile(Mydf$Avg.Position,  probs=c(25,50,75)/100)
density <- ggplot(Mydf, aes(x = Avg.Position, fill = Network))
d <- density + geom_density(alpha = 0.2) + xlim(1,11) + opts(title = "September 2010") + geom_vline(xintercept = quartiles, colour = "red")
print(d)

I'd like to compute the area under each curve for a given Avg.Position range. Sort of like pnorm for the normal curve. Any ideas?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

梦晓ヶ微光ヅ倾城 2024-10-03 13:08:40

单独计算密度并从该密度开始绘制。然后您可以使用基本算术来进行估计。通过将一组小正方形的面积加在一起来近似积分。我为此使用平均方法。长度是两个 x 值之间的差,高度是间隔开始和结束时 y 值的平均值。我使用 Zoo 包中的 rollmeans 函数,但这也可以使用基本包来完成。

require(zoo)

X <- rnorm(100)
# calculate the density and check the plot
Y <- density(X) # see ?density for parameters
plot(Y$x,Y$y, type="l") #can use ggplot for this too
# set an Avg.position value
Avg.pos <- 1

# construct lengths and heights
xt <- diff(Y$x[Y$x<Avg.pos])
yt <- rollmean(Y$y[Y$x<Avg.pos],2)
# This gives you the area
sum(xt*yt)

这为您提供了小数点后最多 3 位数字的良好近似值。如果您知道密度函数,请查看 ?integrate

Calculate the density seperately and plot that one to start with. Then you can use basic arithmetics to get the estimate. An integration is approximated by adding together the area of a set of little squares. I use the mean method for that. the length is the difference between two x-values, the height is the mean of the y-value at the begin and at the end of the interval. I use the rollmeans function in the zoo package, but this can be done using the base package too.

require(zoo)

X <- rnorm(100)
# calculate the density and check the plot
Y <- density(X) # see ?density for parameters
plot(Y$x,Y$y, type="l") #can use ggplot for this too
# set an Avg.position value
Avg.pos <- 1

# construct lengths and heights
xt <- diff(Y$x[Y$x<Avg.pos])
yt <- rollmean(Y$y[Y$x<Avg.pos],2)
# This gives you the area
sum(xt*yt)

This gives you a good approximation up to 3 digits behind the decimal sign. If you know the density function, take a look at ?integrate

就此别过 2024-10-03 13:08:40

三种可能性:

logspline 包提供了一种不同的估计密度曲线的方法,但它确实包含结果的 pnorm 样式函数。

您还可以通过将密度函数返回的 x 和 y 变量输入到 approxfun 函数并将结果与​​Integrate 函数一起使用来近似面积。除非您对小尾部区域(或非常小的间隔)的精确估计感兴趣,否则这可能会给出合理的近似值。

密度估计只是以数据为中心的核的总和,这样的核就是正态分布。您可以使用由带宽定义并以数据为中心的 sd 对 pnorm(或其他内核)的面积进行平均。

Three possibilities:

The logspline package provides a different method of estimating density curves, but it does include pnorm style functions for the result.

You could also approximate the area by feeding the x and y variables returned by the density function to the approxfun function and using the result with the integrate function. Unless you are interested in precise estimates of small tail areas (or very small intervals) then this will probably give a reasonable approximation.

Density estimates are just sums of the kernels centered at the data, one such kernel is just the normal distribution. You could average the areas from pnorm (or other kernels) with the sd defined by the bandwidth and centered at your data.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文