cor 仅显示 NA 或 1 的相关性 - 为什么?

发布于 2024-09-24 23:41:50 字数 259 浏览 0 评论 0原文

我在包含所有数值的 data.frame 上运行 cor(),得到的结果是:

       price exprice...
price      1      NA
exprice   NA       1
...

所以它是 1NA 对于结果表中的每个值。为什么显示的是 NA 而不是有效的相关性?

I'm running cor() on a data.framewith all numeric values and I'm getting this as the result:

       price exprice...
price      1      NA
exprice   NA       1
...

So it's either 1 or NA for each value in the resulting table. Why are the NAs showing up instead of valid correlations?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

溺孤伤于心 2024-10-01 23:41:51

就我而言,我使用了两个以上的变量,这对我来说效果更好:

cor(x = as.matrix(tbl), method = "pearson", use = "pairwise.complete.obs")

但是:

如果 use 的值为“pairwise.complete.obs”,则使用这些变量的所有完整观察对来计算每对变量之间的相关性或协方差。这可能会导致协方差或相关矩阵不是半正定的,如果该变量对没有完整的对,也会导致 NA 条目。

In my case I was using more than two variables, and this worked for me better:

cor(x = as.matrix(tbl), method = "pearson", use = "pairwise.complete.obs")

However:

If use has the value "pairwise.complete.obs" then the correlation or covariance between each pair of variables is computed using all complete pairs of observations on those variables. This can result in covariance or correlation matrices which are not positive semi-definite, as well as NA entries if there are no complete pairs for that pair of variables.

逆蝶 2024-10-01 23:41:51

NA 实际上可能有两个原因。一是您的数据中存在 NA。另一种情况是由于其中一个值是恒定的。这导致标准差等于零,因此 cor 函数返回 NA。

The NA can actually be due to 2 reasons. One is that there is a NA in your data. Another one is due to there being one of the values being constant. This results in standard deviation being equal to zero and hence the cor function returns NA.

居里长安 2024-10-01 23:41:50

使用 use 参数告诉关联忽略 NA,例如:

cor(data$price, data$exprice, use = "complete.obs")

Tell the correlation to ignore the NAs with use argument, e.g.:

cor(data$price, data$exprice, use = "complete.obs")
玩心态 2024-10-01 23:41:50

1 是因为一切都与自身完全相关,NA 是因为变量中存在 NA

您必须指定 R 在存在缺失值时如何计算相关性,因为默认情况下仅计算具有完整信息的系数。

您可以使用 coruse 参数更改此行为,请参阅 ?cor 了解详细信息。

The 1s are because everything is perfectly correlated with itself, and the NAs are because there are NAs in your variables.

You will have to specify how you want R to compute the correlation when there are missing values, because the default is to only compute a coefficient with complete information.

You can change this behavior with the use argument to cor, see ?cor for details.

冧九 2024-10-01 23:41:50

如果存在方差为零的属性(所有元素都相等),也会出现 NA;例如,参见:

cor(cbind(a=runif(10),b=rep(1,10)))

返回:

   a  b
a  1 NA
b NA  1
Warning message:
In cor(cbind(a = runif(10), b = rep(1, 10))) :
  the standard deviation is zero

NAs also appear if there are attributes with zero variance (with all elements equal); see for instance:

cor(cbind(a=runif(10),b=rep(1,10)))

which returns:

   a  b
a  1 NA
b NA  1
Warning message:
In cor(cbind(a = runif(10), b = rep(1, 10))) :
  the standard deviation is zero
丢了幸福的猪 2024-10-01 23:41:50

非常简单且正确的答案

告诉关联忽略带有 use 参数的 NA,例如:

cor(data$price, data$exprice, use = "complete.obs")

very simple and correct answer

Tell the correlation to ignore the NAs with use argument, e.g.:

cor(data$price, data$exprice, use = "complete.obs")
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文