Excel、Matplotlib、matlab、R等都可以绘制直方图。在许多情况下,我们必须将原始大样本更改为一组间隔。Wiki 说我们针对此任务有不同的算法,但最流行的是平方根选择 wiki 中的文章。在文本中我没有看到这个说法的证据。所以我的问题是:哪种算法最适合这项任务?
关于这个问题,您有什么建议可以阅读?
Excel,Matplotlib,matlab,R and etc can draw histogram. In many cases we must change original big sample to set of intervals.Wiki said that we have different algorithms for this task,but most popular is square-root choice article in wiki. In text i don't see proof for this statment. So my question is:wich is algorithm the best for this task?
What can you advise to read about this problem?
发布评论
评论(1)
如果您想要第二个意见,并有更彻底的理由,请尝试 Izenman 的“现代多元统计技术......”的第 4.3 节。对于正态分布的特殊情况,他提出了 3.4908*sigma*n^(-1/3) 的 bin 宽度,这与维基百科中的 Freedman-Diacontis 选择非常接近。
然而,Izenman 还表明,对于他优化产生此 bin 宽度的度量,与其他估计器相比,直方图的表现相当糟糕,因此我建议,如果您准备努力获得尽可能好的估计值,那么您可以开始通过从直方图更改为核密度估计器(Izenman 的第 4.5 节和 http://en.wikipedia.org/维基/内核密度估计)
If you want a second opinion, complete with a more thorough justification, try section 4.3 of "Modern Multivariate Statistical Techniques..." by Izenman. For the particular case of the normal distribution, he comes up with a bin width of 3.4908*sigma*n^(-1/3), which is pretty close to the Freedman-Diacontis choice in Wikipedia.
However, Izenman also shows that, for the measure he optimises to produce this bin width, the histogram does pretty badly compared to other estimators, so I suggest that if you are prepared to work hard to get as good an estimate as possible, you start off by changing from histograms to kernel density estimators (section 4.5 of Izenman and http://en.wikipedia.org/wiki/Kernel_density_estimation)