确定“多好” matlab中有相关性吗？

发布于 2024-12-19 22:02:13 字数 159 浏览 3 评论 0原文

我正在处理一组数据，并且获得了一定的相关性（使用皮尔逊相关系数）。我被要求确定“相关性的质量”，我的主管的意思是，如果我尝试排列有序对的所有 y 值，并比较获得的相关系数，他想看看相关性会是什么。有谁知道这样做的好方法？是否有一个 matlab 函数可以确定与数据随机排列之间的相关性相比的相关性有多好？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

悲念泪 2024-12-26 22:02:13

首先，你必须检查你得到的相关系数是否与零显着不同。 corr 函数可以执行此操作（请参阅pval）。

其次，如果它与零显着不同，那么您需要确定从实际角度来看这种差异是否也显着。在实践中，相关系数的平方（决定系数）被认为是显着的，如果它是大于 0.5，这意味着其中一个相关参数的变化“解释”了另一个相关参数的至少 50% 的变化。

第三，在某些情况下，确定系数接近于1，但这不足以确定“相关性优度”。例如，如果使用两种不同的方法测量同一变量，通常会得到非常相似的值，因此相关系数几乎为 1。在这种情况下，您应该应用 Bland-Altman 分析，在 Matlab 中很容易实现，并且有自己的“优点”参数（偏差和所谓的限制）的协议）。

回复收藏 0 原文

下雨或天晴 2024-12-26 22:02:13

您可以将一个向量的标签排列 N 次，并计算每次迭代的相关系数 (cc)。然后您可以将这些值的分布与实际相关性进行比较。

像这样的东西：

%# random data
n = 20;
x = (1:n)';
y = x + randn(n,1)*3;

%# real correlation
cc = corr(x,y);

%# do permutations
n_iter = 100; %# number of permutations
cc_iter = zeros(n_iter,1); %# preallocate the vector
for k = 1:n_iter
    ind = randperm(n); %# vector of random permutations
    cc_iter(k) = corr(x,y(ind));
end

%# calculate statistics
cc_mean = mean(cc_iter);
cc_std = std(cc_iter);
zval = cc - cc_mean ./ cc_std;
%# probability that the real cc belongs to the same distribution as cc from permuted data
pv = 2 * normcdf(-abs(zval),cc_mean,cc_std); 

%# plot
hist(cc_iter,20)
line([cc cc],ylim,'color','r') %# real value

在此处输入图像描述

另外，如果您计算与 [cc pv] = corr( x,y)，您将获得相关性与无相关性之间差异的 p 值。该 p 值是根据向量呈正态分布的假设计算得出的。但是，如果您计算的不是 Pearson，而是 Spearman 或 Kendall 相关性（非参数），这些 p 值将来自随机排列的数据：

[cc pv] = corr(x,y,'type','Spearman')

You can permute one vector's labels N times and calculate coefficient of correlations (cc) for each iteration. Then you can compare distribution of those values with the real correlation.

Something like this:

%# random data
n = 20;
x = (1:n)';
y = x + randn(n,1)*3;

%# real correlation
cc = corr(x,y);

%# do permutations
n_iter = 100; %# number of permutations
cc_iter = zeros(n_iter,1); %# preallocate the vector
for k = 1:n_iter
    ind = randperm(n); %# vector of random permutations
    cc_iter(k) = corr(x,y(ind));
end

%# calculate statistics
cc_mean = mean(cc_iter);
cc_std = std(cc_iter);
zval = cc - cc_mean ./ cc_std;
%# probability that the real cc belongs to the same distribution as cc from permuted data
pv = 2 * normcdf(-abs(zval),cc_mean,cc_std); 

%# plot
hist(cc_iter,20)
line([cc cc],ylim,'color','r') %# real value

enter image description here

In addition, if you compute correlation with [cc pv] = corr(x,y), you get p-value of how your correlation is different from no correlation. This p-value is calculated from assumption that your vector distributed normally. However, if you calculate not Pearson, but Spearman or Kendall correlation (non-parametric), those p-values will be from randomly permuted data: