R:如何用分布组合来拟合大型数据集?
为了用一个分布拟合实数值 (x
) 数据集,我们可以使用 MASS,如下所示 gamma 或学生的 t分布:
fitdistr(x, "gamma")
或者
fitdistr(x2, "t")
如果我认为我的数据集应该符合 gamma 和 t 分布之和怎么办?
P(X) = Gamma(x) + t(x)
我可以使用 R 中的最大似然拟合来拟合概率分布混合的参数吗?
To fit a dataset of real-valued numbers (x
) with one distribution, we can use MASS as follows either the gamma or Student's t distribution:
fitdistr(x, "gamma")
or
fitdistr(x2, "t")
What if I believe my dataset should fit by the sum of gamma and t distributions?
P(X) = Gamma(x) + t(x)
Can I fit the parameters of mixtures of probability distributions using Maximum Likelihood fitting in R?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
对于某些参数,例如正态分布 或 指数分布。对于其他参数,没有分析估计器,但您可以使用数值分析来找到合理的参数估计值。
R 中的 fitdistr() 函数通过调用 optim() 函数。如果您认为您的数据是 Gamma 和 t 分布的混合,那么只需创建一个描述这种混合的似然函数即可。然后,将这些参数值传递给 optim() 进行优化。下面是使用这种方法拟合分布的示例:
在 R 中运行此程序会产生以下输出:
这与 Mean = 0 和 sd = 1 的初始值相当接近。
不要忘记,对于两种分布的混合,您有一个额外参数指定分布之间的相对权重。另外,一次拟合大量参数时要小心。由于有大量免费参数,您需要担心过度拟合。
There are analytic maximum-likelihood estimators for some parameters, such as the mean of a normal distribution or the rate of an exponential distribution. For other parameters, there is no analytic estimator, but you can use numerical analysis to find reasonable parameter estimates.
The fitdistr() function in R uses numerical optimization of the log-likelihood function by calling the optim() function. If you think that your data is a mixture of Gamma and t distribution, then simply make a likelihood function that describes such a mixture. Then, pass those parameter values to optim() for optimization. Here is an example using this approach to fitting a distribution:
Running this program in R produces this output:
That's fairly close to the initial values of mean = 0 and sd = 1.
Don't forget that with a mixture of two distributions, you have one extra parameter that specifies the relative weights between the distributions. Also, be careful about fitting lots of parameters at once. With lots of free parameters you need to worry about overfitting.
尝试混合分配。这是三个分布混合的示例:
https://stats. stackexchange.com/questions/10062/which-r-package-to-use-to-calculate-component-parameters-for-a-mixture-model
Try mixdist. Here's an example of a mixture of three distributions:
https://stats.stackexchange.com/questions/10062/which-r-package-to-use-to-calculate-component-parameters-for-a-mixture-model