R 中 cov 函数中的pairwise.complete.obs

发布于 2025-01-11 06:08:20 字数 1391 浏览 3 评论 0原文

我有一个模拟数据集（问题），如下所示：

A = factor(rep("A",252));A
B = factor(rep("B",190));B
FACT = c(A,B)
x = rnorm(252)

y = rnorm(190)
d = c(x,y)
DATA = tibble(FACT,d);DATA

结果：

# A tibble: 442 x 2
   FACT       d
   <fct>  <dbl>
 1 A     -0.172
 2 A      1.23 
 3 A     -0.589
 4 A      0.512
 5 A     -1.00 
 6 A      0.532
 7 A      0.562
 8 A     -0.403
 9 A      2.10 
10 A      0.649
# ... with 432 more rows

现在我有一个长度为 100 的感兴趣向量。

z = rnorm(100)

我想分别找到向量 z 与每个向量 x 和 y 的协方差。我尝试在 R 中这样做：

DATA %>%
  group_by(FACT)%>%
  dplyr::mutate(row = row_number())%>%
  tidyr::pivot_wider(names_from = FACT, values_from = d)%>%
  dplyr::select(-row)%>%
  dplyr::mutate((across(.cols= everything(),~cov(.x,z,use= "pairwise.complete.obs"))))%>%
  slice(n=1)%>%
  tidyr::pivot_longer( cols = tidyselect::everything(), names_to = "FACT", values_to = "CoV")

但是 R 报告我一个错误，即参数使用“pairwise.complete.obs”存在问题。

错误是：

Error in `dplyr::mutate()`:
! Problem while computing `..1 = (across(.cols =
  everything(), ~cov(.x, z, use =
  "pairwise.complete.obs")))`.
Caused by error in `across()`:
! Problem while computing column `A`.
Caused by error in `cov()`:
! incompatible dimensions

想象一下我的现实问题有 150 个因素类别。如何修复？有什么帮助吗？

原文

I have a simulated dataset (problem) that looks like this:

A = factor(rep("A",252));A
B = factor(rep("B",190));B
FACT = c(A,B)
x = rnorm(252)

y = rnorm(190)
d = c(x,y)
DATA = tibble(FACT,d);DATA

resulting to :

# A tibble: 442 x 2
   FACT       d
   <fct>  <dbl>
 1 A     -0.172
 2 A      1.23 
 3 A     -0.589
 4 A      0.512
 5 A     -1.00 
 6 A      0.532
 7 A      0.562
 8 A     -0.403
 9 A      2.10 
10 A      0.649
# ... with 432 more rows

Now i have a vector of interest with has length 100.

z = rnorm(100)

i want to find the covariance of vector z with each vector x and y respectively.
Doing so in R i tried :

DATA %>%
  group_by(FACT)%>%
  dplyr::mutate(row = row_number())%>%
  tidyr::pivot_wider(names_from = FACT, values_from = d)%>%
  dplyr::select(-row)%>%
  dplyr::mutate((across(.cols= everything(),~cov(.x,z,use= "pairwise.complete.obs"))))%>%
  slice(n=1)%>%
  tidyr::pivot_longer( cols = tidyselect::everything(), names_to = "FACT", values_to = "CoV")

But R reports me an error that there is an issue with the argument use "pairwise.complete.obs".

The error is :

Error in `dplyr::mutate()`:
! Problem while computing `..1 = (across(.cols =
  everything(), ~cov(.x, z, use =
  "pairwise.complete.obs")))`.
Caused by error in `across()`:
! Problem while computing column `A`.
Caused by error in `cov()`:
! incompatible dimensions

Imagine that my realworld problem has 150 factor categories.
How can be fixed ? Any help ?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

完美的未来在梦里 2025-01-18 06:08:20

问题是你试图获得不同长度向量的协方差。 “pairwise.complete.obs”仅包含在错误消息中，因为它正在打印引发错误的调用，但这不是问题。重要的一点是：

Caused by error in `cov()`:
! incompatible dimensions

即，您请求 252 长度向量与 100 长度向量的协方差。如果所有向量的长度相同，则不会出现错误：

library(tidyverse)
A = factor(rep("A",100))
B = factor(rep("B",100))
FACT = c(A,B)
x = rnorm(100)

y = rnorm(100)
d = c(x,y)
DATA = tibble(FACT,d)

z = rnorm(100)

DATA %>%
  group_by(FACT)%>%
  dplyr::mutate(row = row_number())%>%
  tidyr::pivot_wider(names_from = FACT, values_from = d) %>% 
  dplyr::select(-row)%>%
  dplyr::mutate((across(.cols= everything(),~cov(.x,z,use= "pairwise.complete.obs"))))%>%
  slice(n=1)%>%
  tidyr::pivot_longer( cols = tidyselect::everything(), names_to = "FACT", values_to = "CoV")

# # A tibble: 2 x 2
#   FACT      CoV
#   <chr>   <dbl>
# 1 A      0.0705
# 2 B     -0.214

编辑：

OP注释，

问题是pairwise.complete.obs没有解决所需向量长度不匹配的问题。

“pairwise.complete.obs”用于删除任一向量为 NA 的行。但输入向量的长度仍然必须相等。例如：

# returns NA due to missing values
cov(
  c(1,2,3,NA,5,6),
  c(6,NA,2,NA,5,1)
)
# NA

# with pairwise.complete.obs, returns covariance for pairs without NAs
cov(
  c(1,2,3,NA,5,6),
  c(6,NA,2,NA,5,1),
  use = "pairwise.complete.obs"
)
# -3.166667

# but still throws an error for unequal dimensions
cov(
  c(1,2,3,NA,5,6,7,8),
  c(6,NA,2,NA,5,1),
  use = "pairwise.complete.obs"
)
# Error in cov(c(1, 2, 3, NA, 5, 6, 7, 8), c(6, NA, 2, NA, 5, 1), use = "pairwise.complete.obs") : 
#   incompatible dimensions

根本问题是协方差基于值对。一种思考方式是，您的输入向量需要具有相同的长度，以便 R 知道您希望如何将值“配对”。因此，尝试获得不同长度向量的协方差不太有意义。

postscript：使用dplyr::summarize可以大大简化您的代码：

DATA %>%
  group_by(FACT) %>%
  summarize(CoV = cov(d, z, use= "pairwise.complete.obs"))

# # A tibble: 2 x 2
#   FACT      CoV
#   <chr>   <dbl>
# 1 A      0.0705
# 2 B     -0.214

The problem is that you’re trying to get covariance for vectors of different lengths. "pairwise.complete.obs" is just included in the error message because it’s printing the call which raised the error, but it’s not the problem. The important bit is:

Caused by error in `cov()`:
! incompatible dimensions

ie, you’re requesting covariance of a 252-length vector with a 100-length vector. If all vectors are the same length, there’s no error:

library(tidyverse)
A = factor(rep("A",100))
B = factor(rep("B",100))
FACT = c(A,B)
x = rnorm(100)

y = rnorm(100)
d = c(x,y)
DATA = tibble(FACT,d)

z = rnorm(100)

DATA %>%
  group_by(FACT)%>%
  dplyr::mutate(row = row_number())%>%
  tidyr::pivot_wider(names_from = FACT, values_from = d) %>% 
  dplyr::select(-row)%>%
  dplyr::mutate((across(.cols= everything(),~cov(.x,z,use= "pairwise.complete.obs"))))%>%
  slice(n=1)%>%
  tidyr::pivot_longer( cols = tidyselect::everything(), names_to = "FACT", values_to = "CoV")

# # A tibble: 2 x 2
#   FACT      CoV
#   <chr>   <dbl>
# 1 A      0.0705
# 2 B     -0.214

Edit:

OP comments,

The problem is that the pairwise.complete.obs does not solve the mismatch in length of the needed vectors.

"pairwise.complete.obs" is for dropping rows where either vector is NA. But the input vectors still have to be of equal length. e.g.:

# returns NA due to missing values
cov(
  c(1,2,3,NA,5,6),
  c(6,NA,2,NA,5,1)
)
# NA

# with pairwise.complete.obs, returns covariance for pairs without NAs
cov(
  c(1,2,3,NA,5,6),
  c(6,NA,2,NA,5,1),
  use = "pairwise.complete.obs"
)
# -3.166667

# but still throws an error for unequal dimensions
cov(
  c(1,2,3,NA,5,6,7,8),
  c(6,NA,2,NA,5,1),
  use = "pairwise.complete.obs"
)
# Error in cov(c(1, 2, 3, NA, 5, 6, 7, 8), c(6, NA, 2, NA, 5, 1), use = "pairwise.complete.obs") : 
#   incompatible dimensions

The underlying problem is that covariance is based on pairs of values. One way to think of it is that your input vectors need to be the same length so R knows how you want the values "paired up." So trying to get covariance for different length vectors doesn’t quite make sense.

postscript: Your code could be simplified quite a bit using dplyr::summarize:

DATA %>%
  group_by(FACT) %>%
  summarize(CoV = cov(d, z, use= "pairwise.complete.obs"))

# # A tibble: 2 x 2
#   FACT      CoV
#   <chr>   <dbl>
# 1 A      0.0705
# 2 B     -0.214

回复收藏 0 原文

~没有更多了~