计算连续密度图下的面积
我使用此绘制了两条密度曲线:
Network <- Mydf$Networks
quartiles <- quantile(Mydf$Avg.Position, probs=c(25,50,75)/100)
density <- ggplot(Mydf, aes(x = Avg.Position, fill = Network))
d <- density + geom_density(alpha = 0.2) + xlim(1,11) + opts(title = "September 2010") + geom_vline(xintercept = quartiles, colour = "red")
print(d)
我想计算给定 Avg.Position 范围的每条曲线下的面积。有点像正态曲线的 pnorm。有什么想法吗?
I have two density curves plotted using this:
Network <- Mydf$Networks
quartiles <- quantile(Mydf$Avg.Position, probs=c(25,50,75)/100)
density <- ggplot(Mydf, aes(x = Avg.Position, fill = Network))
d <- density + geom_density(alpha = 0.2) + xlim(1,11) + opts(title = "September 2010") + geom_vline(xintercept = quartiles, colour = "red")
print(d)
I'd like to compute the area under each curve for a given Avg.Position range. Sort of like pnorm for the normal curve. Any ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
单独计算密度并从该密度开始绘制。然后您可以使用基本算术来进行估计。通过将一组小正方形的面积加在一起来近似积分。我为此使用平均方法。长度是两个 x 值之间的差,高度是间隔开始和结束时 y 值的平均值。我使用 Zoo 包中的 rollmeans 函数,但这也可以使用基本包来完成。
这为您提供了小数点后最多 3 位数字的良好近似值。如果您知道密度函数,请查看
?integrate
Calculate the density seperately and plot that one to start with. Then you can use basic arithmetics to get the estimate. An integration is approximated by adding together the area of a set of little squares. I use the mean method for that. the length is the difference between two x-values, the height is the mean of the y-value at the begin and at the end of the interval. I use the rollmeans function in the zoo package, but this can be done using the base package too.
This gives you a good approximation up to 3 digits behind the decimal sign. If you know the density function, take a look at
?integrate
三种可能性:
logspline 包提供了一种不同的估计密度曲线的方法,但它确实包含结果的 pnorm 样式函数。
您还可以通过将密度函数返回的 x 和 y 变量输入到 approxfun 函数并将结果与Integrate 函数一起使用来近似面积。除非您对小尾部区域(或非常小的间隔)的精确估计感兴趣,否则这可能会给出合理的近似值。
密度估计只是以数据为中心的核的总和,这样的核就是正态分布。您可以使用由带宽定义并以数据为中心的 sd 对 pnorm(或其他内核)的面积进行平均。
Three possibilities:
The logspline package provides a different method of estimating density curves, but it does include pnorm style functions for the result.
You could also approximate the area by feeding the x and y variables returned by the density function to the approxfun function and using the result with the integrate function. Unless you are interested in precise estimates of small tail areas (or very small intervals) then this will probably give a reasonable approximation.
Density estimates are just sums of the kernels centered at the data, one such kernel is just the normal distribution. You could average the areas from pnorm (or other kernels) with the sd defined by the bandwidth and centered at your data.