确定“多好” matlab中有相关性吗?

发布于 2024-12-19 22:02:13 字数 159 浏览 3 评论 0原文

我正在处理一组数据,并且获得了一定的相关性(使用皮尔逊相关系数)。我被要求确定“相关性的质量”,我的主管的意思是,如果我尝试排列有序对的所有 y 值,并比较获得的相关系数,他想看看相关性会是什么。有谁知道这样做的好方法?是否有一个 matlab 函数可以确定与数据随机排列之间的相关性相比的相关性有多好?

I'm working with a set of data and I've obtained a certain correlations (using pearson's correlation coefficient). I've been asked to determine the "quality of the correlation," and by that my supervisor means he wants to see what the correlations would be if I tried permuting all the y values of my ordered pairs, and compared the obtained correlation coefficients. Does anyone know a nice way of doing this? Is there a matlab function that would determine how good a correlation is when compared to a correlation between random permutations of the data?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

悲念泪 2024-12-26 22:02:13

首先,你必须检查你得到的相关系数是否与零显着不同。 corr 函数可以执行此操作(请参阅pval)。

其次,如果它与零显着不同,那么您需要确定从实际角度来看这种差异是否也显着。在实践中,相关系数的平方(决定系数)被认为是显着的,如果它是大于 0.5,这意味着其中一个相关参数的变化“解释”了另一个相关参数的至少 50% 的变化。

第三,在某些情况下,确定系数接近于1,但这不足以确定“相关性优度”。例如,如果使用两种不同的方法测量同一变量,通常会得到非常相似的值,因此相关系数几乎为 1。在这种情况下,您应该应用 Bland-Altman 分析,在 Matlab 中很容易实现,并且有自己的“优点”参数(偏差和所谓的限制)的协议)。

First, you have to check whether the correlation coefficient you got is significantly different from zero. The corr function can do this (see pval).

Second, if it's significantly different from zero, then you would like to decide whether this difference is also significant from a practical point of view. In practice, the square of the correlation coefficent (the coefficient of determination) is considered significant, if it's larger than 0.5, which means that the variations of one of the correlated parameters "explains" at least 50% of the variation of the other.

Third, there are cases when the coefficient of determination is close to one, but this is not enough to determine the "goodness of correlation". For example, if you measure the same variable using two different methods, you will usually get very similar values, so the correlation coefficient will be almost 1. In such cases you should apply the Bland-Altman analysis, which is very easy to implement in Matlab, and has its own "goodness" parameters (the bias and the so-called limits of agreement).

下雨或天晴 2024-12-26 22:02:13

您可以将一个向量的标签排列 N 次,并计算每次迭代的相关系数 (cc)。然后您可以将这些值的分布与实际相关性进行比较。

像这样的东西:

%# random data
n = 20;
x = (1:n)';
y = x + randn(n,1)*3;

%# real correlation
cc = corr(x,y);

%# do permutations
n_iter = 100; %# number of permutations
cc_iter = zeros(n_iter,1); %# preallocate the vector
for k = 1:n_iter
    ind = randperm(n); %# vector of random permutations
    cc_iter(k) = corr(x,y(ind));
end

%# calculate statistics
cc_mean = mean(cc_iter);
cc_std = std(cc_iter);
zval = cc - cc_mean ./ cc_std;
%# probability that the real cc belongs to the same distribution as cc from permuted data
pv = 2 * normcdf(-abs(zval),cc_mean,cc_std); 

%# plot
hist(cc_iter,20)
line([cc cc],ylim,'color','r') %# real value

在此处输入图像描述

另外,如果您计算与 [cc pv] = corr( x,y),您将获得相关性与无相关性之间差异的 p 值。该 p 值是根据向量呈正态分布的假设计算得出的。但是,如果您计算的不是 Pearson,而是 Spearman 或 Kendall 相关性(非参数),这些 p 值将来自随机排列的数据:

[cc pv] = corr(x,y,'type','Spearman')

You can permute one vector's labels N times and calculate coefficient of correlations (cc) for each iteration. Then you can compare distribution of those values with the real correlation.

Something like this:

%# random data
n = 20;
x = (1:n)';
y = x + randn(n,1)*3;

%# real correlation
cc = corr(x,y);

%# do permutations
n_iter = 100; %# number of permutations
cc_iter = zeros(n_iter,1); %# preallocate the vector
for k = 1:n_iter
    ind = randperm(n); %# vector of random permutations
    cc_iter(k) = corr(x,y(ind));
end

%# calculate statistics
cc_mean = mean(cc_iter);
cc_std = std(cc_iter);
zval = cc - cc_mean ./ cc_std;
%# probability that the real cc belongs to the same distribution as cc from permuted data
pv = 2 * normcdf(-abs(zval),cc_mean,cc_std); 

%# plot
hist(cc_iter,20)
line([cc cc],ylim,'color','r') %# real value

enter image description here

In addition, if you compute correlation with [cc pv] = corr(x,y), you get p-value of how your correlation is different from no correlation. This p-value is calculated from assumption that your vector distributed normally. However, if you calculate not Pearson, but Spearman or Kendall correlation (non-parametric), those p-values will be from randomly permuted data:

[cc pv] = corr(x,y,'type','Spearman')
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文