当前位置：文江博客话题详情

计算非正态分布的置信区间

发布于 2024-10-08 16:55:23 字数 465 浏览 10 评论 0原文

首先，我应该指出，我的统计知识相当有限，所以如果我的问题看起来微不足道或者甚至没有意义，请原谅我。

我的数据似乎不符合正态分布。通常，当我绘制置信区间时，我会使用平均值±2标准差，但我认为这对于非均匀分布是不可接受的。我的样本量当前设置为 1000 个样本，这似乎足以确定它是否是正态分布。

我使用 Matlab 进行所有处理，那么 Matlab 中是否有任何函数可以轻松计算置信区间（例如 95%）？

我知道有“分位数”和“prctile”函数，但我不确定这是否是我需要使用的。函数“mle”还返回正态分布数据的置信区间，尽管您也可以提供自己的 pdf。

我可以使用 ks密度为我的数据创建一个 pdf，然后将该 pdf 输入到 mle 函数中以给出置信区间吗？

另外，我将如何确定我的数据是否呈正态分布。我的意思是，我目前可以通过查看 ksdenth 的直方图或 pdf 来判断，但是有没有办法定量测量它？

谢谢！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

乖乖兔^ω^ 2024-10-15 16:55:23

所以有几个问题。以下是一些建议

您是对的，1000 个样本的平均值应该呈正态分布（除非您的数据是“重尾”，我假设情况并非如此）。要获得均值的 1-alpha 置信区间（在您的情况下 alpha = 0.05），您可以使用“norminv”函数。例如，假设我们想要数据 X 样本的平均值为 95% CI，那么我们可以输入

N = 1000;             % sample size
X = exprnd(3,N,1);    % sample from a non-normal distribution
mu = mean(X);         % sample mean (normally distributed)
sig = std(X)/sqrt(N); % sample standard deviation of the mean
alphao2 = .05/2;      % alpha over 2   
CI = [mu + norminv(alphao2)*sig ,...
      mu - norminv(alphao2)*sig  ]

CI =

2.9369    3.3126

“测试数据样本是否呈正态分布”可以通过多种方式完成。一种简单的方法是使用 QQ 图。为此，请使用“qqplot(X)”，其中 X 是您的数据样本。如果结果近似为一条直线，则样本正常。如果结果不是一条直线，则样本不正常。

例如，如果如上所述 X = exprnd(3,1000,1)，则样本是非正态的，并且 qqplot 非常非线性：

X = exprnd(3,1000,1);
qqplot(X);

alt text

另一方面，如果数据正常，qqplot 将给出一条直线：

qqplot(randn(1000,1))

替代文字

So there are a couple of questions there. Here are some suggestions

You are right that a mean of 1000 samples should be normally distributed (unless your data is "heavy tailed", which I'm assuming is not the case). to get a 1-alpha-confidence interval for the mean (in your case alpha = 0.05) you can use the 'norminv' function. For example say we wanted a 95% CI for the mean a sample of data X, then we can type

N = 1000;             % sample size
X = exprnd(3,N,1);    % sample from a non-normal distribution
mu = mean(X);         % sample mean (normally distributed)
sig = std(X)/sqrt(N); % sample standard deviation of the mean
alphao2 = .05/2;      % alpha over 2   
CI = [mu + norminv(alphao2)*sig ,...
      mu - norminv(alphao2)*sig  ]

CI =

2.9369    3.3126

Testing if a data sample is normally distribution can be done in a lot of ways. One simple method is with a QQ plot. To do this, use 'qqplot(X)' where X is your data sample. If the result is approximately a straight line, the sample is normal. If the result is not a straight line, the sample is not normal.

For example if X = exprnd(3,1000,1) as above, the sample is non-normal and the qqplot is very non-linear:

X = exprnd(3,1000,1);
qqplot(X);

alt text

On the other hand if the data is normal the qqplot will give a straight line:

qqplot(randn(1000,1))

alt text

回复收藏 0 原文

挖鼻大婶 2024-10-15 16:55:23

您还可以考虑使用 bootci 函数进行引导。

回复收藏 0 原文

心的憧憬 2024-10-15 16:55:23

您可以使用[1]中提出的方法：

MEDIAN +/- 1.7(1.25R / 1.35SQN)

其中R = 四分位数范围，
SQN = N 的平方根

这通常用于缺口箱线图，这是非正态数据的有用数据可视化。如果两个中位数的缺口不重叠，则中位数在大约 95% 的置信水平上大约显着不同。

[1] McGill, R.、JW Tukey 和 WA Larsen。 “箱线图的变体。”美国统计学家。卷。 32，第 1 期，1978 年，第 12-16 页。

You may use the method proposed in [1]:

MEDIAN +/- 1.7(1.25R / 1.35SQN)

Where R = Interquartile Range,
SQN = Square Root of N

This is often used in notched box plots, a useful data visualization for non-normal data. If the notches of two medians do not overlap, the medians are, approximately, significantly different at about a 95% confidence level.

[1] McGill, R., J. W. Tukey, and W. A. Larsen. "Variations of Boxplots." The American Statistician. Vol. 32, No. 1, 1978, pp. 12–16.

回复收藏 0 原文

月牙弯弯 2024-10-15 16:55:23

您确定需要置信区间还是仅需要 90% 的随机数据范围？

如果您需要后者，我建议您使用 prctile()。例如，如果您有一个向量，其中包含随机变量的独立同分布样本，则可以通过运行来获取一些有用的信息

y = prcntile(x, [5 50 95])

。这将在 [y(1), y(3)] 中返回 90% 样本出现的范围。在 y(2) 中，您得到样本的中位数。

尝试以下示例（使用正态分布变量）：

t = 0:99;
tt = repmat(t, 1000, 1);
x = randn(1000, 100) .* tt + tt;  % simple gaussian model with varying mean and variance
y = prctile(x, [5 50 95]);

plot(t,  y);
legend('5%','50%','95%')

Are you sure you need confidence intervals or just the 90% range of the random data?

If you need the latter, I suggest you use prctile(). For example, if you have a vector holding independent identically distributed samples of random variables, you can get some useful information by running

y = prcntile(x, [5 50 95])

This will return in [y(1), y(3)] the range where 90% of your samples occur. And in y(2) you get the median of the sample.

Try the following example (using a normally distributed variable):

t = 0:99;
tt = repmat(t, 1000, 1);
x = randn(1000, 100) .* tt + tt;  % simple gaussian model with varying mean and variance
y = prctile(x, [5 50 95]);

plot(t,  y);
legend('5%','50%','95%')

回复收藏 0 原文