将密度曲线拟合到 R 中的直方图

发布于 2024-08-06 09:20:52 字数 234 浏览 7 评论 0原文

R中有没有可以将曲线拟合到直方图的函数?

假设您有以下直方图

hist(c(rep(65, times=5), rep(25, times=5), rep(35, times=10), rep(45, times=4)))

它看起来正常,但它是倾斜的。我想拟合一条倾斜的正态曲线以环绕该直方图。

这个问题相当基本,但我似乎无法在互联网上找到 R 的答案。

Is there a function in R that fits a curve to a histogram?

Let's say you had the following histogram

hist(c(rep(65, times=5), rep(25, times=5), rep(35, times=10), rep(45, times=4)))

It looks normal, but it's skewed. I want to fit a normal curve that is skewed to wrap around this histogram.

This question is rather basic, but I can't seem to find the answer for R on the internet.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

原谅我要高飞 2024-08-13 09:20:53

Dirk 解释了如何在直方图上绘制密度函数。但有时您可能想要采用更强烈的偏态正态分布假设并绘制它而不是密度。您可以估计分布的参数并使用 sn package< /a>:

> sn.mle(y=c(rep(65, times=5), rep(25, times=5), rep(35, times=10), rep(45, times=4)))
$call
sn.mle(y = c(rep(65, times = 5), rep(25, times = 5), rep(35, 
    times = 10), rep(45, times = 4)))

$cp
    mean     s.d. skewness 
41.46228 12.47892  0.99527 

偏斜正态分布数据图

这可能对于更偏斜正态的数据效果更好:

另一个倾斜正态图

Dirk has explained how to plot the density function over the histogram. But sometimes you might want to go with the stronger assumption of a skewed normal distribution and plot that instead of density. You can estimate the parameters of the distribution and plot it using the sn package:

> sn.mle(y=c(rep(65, times=5), rep(25, times=5), rep(35, times=10), rep(45, times=4)))
$call
sn.mle(y = c(rep(65, times = 5), rep(25, times = 5), rep(35, 
    times = 10), rep(45, times = 4)))

$cp
    mean     s.d. skewness 
41.46228 12.47892  0.99527 

Skew-normal distributed data plot

This probably works better on data that is more skew-normal:

Another skew-normal plot

遗忘曾经 2024-08-13 09:20:53

我遇到了同样的问题,但德克的解决方案似乎不起作用。
每当

"prob" is not a graphical parameter

我阅读 ?hist 并发现 freq: 默认情况下逻辑向量设置为 TRUE 时,

我都会收到此警告消息。对我有用的代码是

hist(x,freq=FALSE)
lines(density(x),na.rm=TRUE)

I had the same problem but Dirk's solution didn't seem to work.
I was getting this warning messege every time

"prob" is not a graphical parameter

I read through ?hist and found about freq: a logical vector set TRUE by default.

the code that worked for me is

hist(x,freq=FALSE)
lines(density(x),na.rm=TRUE)
甲如呢乙后呢 2024-08-13 09:20:53

这是核密度估计,请点击此链接查看该概念及其概念的精彩说明参数。

曲线的形状主要取决于两个元素:1)估计 a 的内核(通常是 Epanechnikov 或 Gaussian)通过输入并权衡所有数据,为 x 坐标中的每个值确定 y 坐标中的点;它是对称的,通常是一个集成为一个的正函数; 2)带宽,越大曲线越平滑,越小曲线越摆动。

针对不同的需求,需要采用不同的套餐,可以参考此文档: R 中的密度估计。对于多元变量,您可以转向多元核密度估计。

It's the kernel density estimation, and please hit this link to check a great illustration for the concept and its parameters.

The shape of the curve depends mostly on two elements: 1) the kernel(usually Epanechnikov or Gaussian) that estimates a point in the y coordinate for every value in the x coordinate by inputting and weighing all data; and it is symmetric and usually a positive function that integrates into one; 2) the bandwidth, the larger the smoother the curve, and the smaller the more wiggled the curve.

For different requirements, different packages should be applied, and you can refer to this document: Density estimation in R. And for multivariate variables, you can turn to the multivariate kernel density estimation.

萌化 2024-08-13 09:20:53

一些评论要求将密度估计线缩放到直方图的峰值,以便 y 轴保留为计数而不是密度。为了实现这一点,我编写了一个小函数来自动拉动最大箱高度并相应地缩放密度函数的 y 维度。

hist_dens <- function(x, breaks = "Scott", main = "title", xlab = "x", ylab = "count") {
  
  dens <- density(x, na.rm = T)
  
  raw_hist <- hist(x, breaks = breaks, plot = F)
  
  scale <- max(raw_hist$counts)/max(raw_hist$density)
  
  hist(x, breaks = breaks, prob = F, main = main, xlab = xlab, ylab = ylab)
  
  lines(list(x = dens$x, y = scale * dens$y), col = "red", lwd = 2)
  
}

hist_dens(rweibull(1000, 2))

reprex 包 (v2.0.1)

Some comments requested scaling the density estimate line to the peak of the histogram so that the y axis would remain as counts rather than density. To achieve this I wrote a small function to automatically pull the max bin height and scale the y dimension of the density function accordingly.

hist_dens <- function(x, breaks = "Scott", main = "title", xlab = "x", ylab = "count") {
  
  dens <- density(x, na.rm = T)
  
  raw_hist <- hist(x, breaks = breaks, plot = F)
  
  scale <- max(raw_hist$counts)/max(raw_hist$density)
  
  hist(x, breaks = breaks, prob = F, main = main, xlab = xlab, ylab = ylab)
  
  lines(list(x = dens$x, y = scale * dens$y), col = "red", lwd = 2)
  
}

hist_dens(rweibull(1000, 2))

Created on 2021-12-19 by the reprex package (v2.0.1)

淡墨 2024-08-13 09:20:52

如果我正确理解你的问题,那么你可能需要密度估计和直方图:

X <- c(rep(65, times=5), rep(25, times=5), rep(35, times=10), rep(45, times=4))
hist(X, prob=TRUE)            # prob=TRUE for probabilities not counts
lines(density(X))             # add a density estimate with defaults
lines(density(X, adjust=2), lty="dotted")   # add another "smoother" density

稍后编辑:

这是一个稍微更漂亮的版本:

X <- c(rep(65, times=5), rep(25, times=5), rep(35, times=10), rep(45, times=4))
hist(X, prob=TRUE, col="grey")# prob=TRUE for probabilities not counts
lines(density(X), col="blue", lwd=2) # add a density estimate with defaults
lines(density(X, adjust=2), lty="dotted", col="darkgreen", lwd=2) 

以及它生成的图表:

< img src="https://i.sstatic.net/lHCqw.png" alt="在此处输入图像描述">

If I understand your question correctly, then you probably want a density estimate along with the histogram:

X <- c(rep(65, times=5), rep(25, times=5), rep(35, times=10), rep(45, times=4))
hist(X, prob=TRUE)            # prob=TRUE for probabilities not counts
lines(density(X))             # add a density estimate with defaults
lines(density(X, adjust=2), lty="dotted")   # add another "smoother" density

Edit a long while later:

Here is a slightly more dressed-up version:

X <- c(rep(65, times=5), rep(25, times=5), rep(35, times=10), rep(45, times=4))
hist(X, prob=TRUE, col="grey")# prob=TRUE for probabilities not counts
lines(density(X), col="blue", lwd=2) # add a density estimate with defaults
lines(density(X, adjust=2), lty="dotted", col="darkgreen", lwd=2) 

along with the graph it produces:

enter image description here

卷耳 2024-08-13 09:20:52

很容易做到这一点

library(ggplot2)
dataset <- data.frame(X = c(rep(65, times=5), rep(25, times=5), 
                            rep(35, times=10), rep(45, times=4)))
ggplot(dataset, aes(x = X)) + 
  geom_histogram(aes(y = ..density..)) + 
  geom_density()

使用 ggplot2或模仿德克解决方案的结果

ggplot(dataset, aes(x = X)) + 
  geom_histogram(aes(y = ..density..), binwidth = 5) + 
  geom_density()

Such thing is easy with ggplot2

library(ggplot2)
dataset <- data.frame(X = c(rep(65, times=5), rep(25, times=5), 
                            rep(35, times=10), rep(45, times=4)))
ggplot(dataset, aes(x = X)) + 
  geom_histogram(aes(y = ..density..)) + 
  geom_density()

or to mimic the result from Dirk's solution

ggplot(dataset, aes(x = X)) + 
  geom_histogram(aes(y = ..density..), binwidth = 5) + 
  geom_density()
澜川若宁 2024-08-13 09:20:52

这是我的做法:

foo <- rnorm(100, mean=1, sd=2)
hist(foo, prob=TRUE)
curve(dnorm(x, mean=mean(foo), sd=sd(foo)), add=TRUE)

一个额外的练习是使用 ggplot2 包来完成此操作......

Here's the way I do it:

foo <- rnorm(100, mean=1, sd=2)
hist(foo, prob=TRUE)
curve(dnorm(x, mean=mean(foo), sd=sd(foo)), add=TRUE)

A bonus exercise is to do this with ggplot2 package ...

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文