R 中 cov 函数中的pairwise.complete.obs
我有一个模拟数据集(问题),如下所示:
A = factor(rep("A",252));A
B = factor(rep("B",190));B
FACT = c(A,B)
x = rnorm(252)
y = rnorm(190)
d = c(x,y)
DATA = tibble(FACT,d);DATA
结果:
# A tibble: 442 x 2
FACT d
<fct> <dbl>
1 A -0.172
2 A 1.23
3 A -0.589
4 A 0.512
5 A -1.00
6 A 0.532
7 A 0.562
8 A -0.403
9 A 2.10
10 A 0.649
# ... with 432 more rows
现在我有一个长度为 100 的感兴趣向量。
z = rnorm(100)
我想分别找到向量 z 与每个向量 x 和 y 的协方差。 我尝试在 R 中这样做:
DATA %>%
group_by(FACT)%>%
dplyr::mutate(row = row_number())%>%
tidyr::pivot_wider(names_from = FACT, values_from = d)%>%
dplyr::select(-row)%>%
dplyr::mutate((across(.cols= everything(),~cov(.x,z,use= "pairwise.complete.obs"))))%>%
slice(n=1)%>%
tidyr::pivot_longer( cols = tidyselect::everything(), names_to = "FACT", values_to = "CoV")
但是 R 报告我一个错误,即参数使用“pairwise.complete.obs”存在问题。
错误是:
Error in `dplyr::mutate()`:
! Problem while computing `..1 = (across(.cols =
everything(), ~cov(.x, z, use =
"pairwise.complete.obs")))`.
Caused by error in `across()`:
! Problem while computing column `A`.
Caused by error in `cov()`:
! incompatible dimensions
想象一下我的现实问题有 150 个因素类别。 如何修复?有什么帮助吗?
I have a simulated dataset (problem) that looks like this:
A = factor(rep("A",252));A
B = factor(rep("B",190));B
FACT = c(A,B)
x = rnorm(252)
y = rnorm(190)
d = c(x,y)
DATA = tibble(FACT,d);DATA
resulting to :
# A tibble: 442 x 2
FACT d
<fct> <dbl>
1 A -0.172
2 A 1.23
3 A -0.589
4 A 0.512
5 A -1.00
6 A 0.532
7 A 0.562
8 A -0.403
9 A 2.10
10 A 0.649
# ... with 432 more rows
Now i have a vector of interest with has length 100.
z = rnorm(100)
i want to find the covariance of vector z with each vector x and y respectively.
Doing so in R i tried :
DATA %>%
group_by(FACT)%>%
dplyr::mutate(row = row_number())%>%
tidyr::pivot_wider(names_from = FACT, values_from = d)%>%
dplyr::select(-row)%>%
dplyr::mutate((across(.cols= everything(),~cov(.x,z,use= "pairwise.complete.obs"))))%>%
slice(n=1)%>%
tidyr::pivot_longer( cols = tidyselect::everything(), names_to = "FACT", values_to = "CoV")
But R reports me an error that there is an issue with the argument use "pairwise.complete.obs".
The error is :
Error in `dplyr::mutate()`:
! Problem while computing `..1 = (across(.cols =
everything(), ~cov(.x, z, use =
"pairwise.complete.obs")))`.
Caused by error in `across()`:
! Problem while computing column `A`.
Caused by error in `cov()`:
! incompatible dimensions
Imagine that my realworld problem has 150 factor categories.
How can be fixed ? Any help ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
问题是你试图获得不同长度向量的协方差。 “pairwise.complete.obs”仅包含在错误消息中,因为它正在打印引发错误的调用,但这不是问题。重要的一点是:
即,您请求 252 长度向量与 100 长度向量的协方差。如果所有向量的长度相同,则不会出现错误:
编辑:
OP注释,
“pairwise.complete.obs”用于删除任一向量为
NA
的行。但输入向量的长度仍然必须相等。例如:根本问题是协方差基于值对。一种思考方式是,您的输入向量需要具有相同的长度,以便 R 知道您希望如何将值“配对”。因此,尝试获得不同长度向量的协方差不太有意义。
postscript:使用
dplyr::summarize
可以大大简化您的代码:The problem is that you’re trying to get covariance for vectors of different lengths. "pairwise.complete.obs" is just included in the error message because it’s printing the call which raised the error, but it’s not the problem. The important bit is:
ie, you’re requesting covariance of a 252-length vector with a 100-length vector. If all vectors are the same length, there’s no error:
Edit:
OP comments,
"pairwise.complete.obs" is for dropping rows where either vector is
NA
. But the input vectors still have to be of equal length. e.g.:The underlying problem is that covariance is based on pairs of values. One way to think of it is that your input vectors need to be the same length so R knows how you want the values "paired up." So trying to get covariance for different length vectors doesn’t quite make sense.
postscript: Your code could be simplified quite a bit using
dplyr::summarize
: