概率和相对频率
如果我使用相对频率来估计事件的概率,那么基于实验数量的估计效果如何? 标准差是一个好的衡量标准吗? 纸质/链接/在线书籍将是完美的。
If I use relative frequency to estimate the probability of an event, how good is my estimate based on the number of experiments? Is standard deviation a good measure? A paper/link/online book would be perfect.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我相信您正在寻找样本比例的置信区间。 以下是一些可能有用的资源:
比例置信区间教程
比例讲义的置信区间
基本上你的估计会提高与样本数量的平方根成反比。 因此,如果您想将误差减少一半,您将需要四倍的样本。
I believe you are looking for the confidence interval for a sample proportion. Here are some resources that might be helpful:
Confidence Interval for Proportion Tutorial
Confidence Interval for Proportion Handout
Basically your estimate improves inverse proportionally to the square root of the number of samples. So if you want to cut your error in half you are going to need four times as many samples.
也许卡方检验就是您想要的。 例如,请参阅维基百科页面 Pearson 卡方检验。 标准差不是您想要的,因为它与分布的形状有关,而不是您对实际分布的估计有多准确。 另请注意,其中大部分内容都与“正态”分布有关,并非所有分布都是正态分布。
Probably a chi-squared test is what you want. See, for example, the wikipedia page on Pearson's chi-square test. Standard deviation isn't what you want, since that's about the shape of the distribution, not how accurate you estimate is of the actual distribution. Also, note that most of these things are about "normal" distributions, and not all distributions are normal.
您计算了 n 个“是/否”实验序列中成功的次数,对吗? 只要单个实验是独立的,您就处于二项式分布的领域(维基百科) 。 成功频率 f = s / n 是成功概率 p 的估计量。 对于 n 次抽奖,频率估计 f 的方差为 p * (1-p) / n。
只要 p 不太接近 0 或 1,并且只要观察值 n 的数量不是“太小”,标准差就可以合理地衡量估计值 f 的质量。
如果 n 足够大(经验法则 n * p > 10),您可以通过正态分布 N(f, f * (1-f) / n) 进行近似,并且标准差估计是一个很好的度量。 请参阅此处进行更广泛的讨论。
这就是说,如果这需要一定的学术严谨性(例如,作为家庭作业),则标准差的近似值不会解决任何问题。
You count the number of successes s in a sequence n of Yes / No experiments, right? As long as the single experiments are independent you are in the realm of the Binomial distribution (Wikipedia). Frequency of success f = s / n is an estimator of the success probability p and. The variance of your frequency estimate f is p * (1-p) / n for n draws.
As long as p is not too close to zero or 1, and as long as you do not have "too small" a number of observations n, the standard deviation will be a reasonable measure for the quality of your estimate f.
If n is large enough (rule of thumb n * p > 10), you can approximate by a normal distribution N(f, f * (1-f) / n), and standard deviation estimate is a good measure. See here for a more extensive discussion.
This said the approximation with the standard deviation will not cut any ice if this needs to have some academic rigour (e.g. is a homework).