R tidyverse - 按组关联,将多列与单列进行比较并返回单个数据帧

发布于 2025-01-11 17:21:03 字数 997 浏览 0 评论 0原文

我有一个包含计数数据的数据集,其结构如下:

SampleIDExpectedObserved_AObserved_B
Aid110810
Aid2684
Bid1151218
Bid2124

我试图用 tidyr/ 实现什么dplyr 是每个观察到的计数与预期计数之间的每个样本相关性(即我不担心通过每个观察列之间的相关性)。

样本数据集相关性
AObserved_A0.99
AObserved_B0.93
BObserved_A0.89
BObserved_B0.91

我可以通过循环来做到这一点,但想知道是否有一种使用 tidyverse 函数的“更清晰”的方法?

任何帮助非常感谢!

I have a dataset which contains count data where the structure looks like:

SampleIDExpectedObserved_AObserved_B
Aid110810
Aid2684
Bid1151218
Bid2124

What I'm trying to get to with tidyr/dplyr is the per-sample correlation between each of the observed counts and the expected counts (i.e. I'm unfussed by the correlation between each of the observed columns).

SampleDatasetCorrelation
AObserved_A0.99
AObserved_B0.93
BObserved_A0.89
BObserved_B0.91

I can do this by looping, but was wondering whether there is a 'clearer' approach to take using tidyverse functions?

Any help much appreciated!!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

情愿 2025-01-18 17:21:03
df %>% 
  group_by(Sample) %>%
  summarize(across(Observed_A:Observed_B, ~cor(.x, Expected))) %>%
  pivot_longer(!Sample, values_to = "Correlation", names_to = "Dataset")
df %>% 
  group_by(Sample) %>%
  summarize(across(Observed_A:Observed_B, ~cor(.x, Expected))) %>%
  pivot_longer(!Sample, values_to = "Correlation", names_to = "Dataset")
星星的軌跡 2025-01-18 17:21:03

怎么样:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
d <- tibble::tribble(~Sample,   ~ID,    ~Expected,  ~Observed_A,    ~Observed_B,
"A",    "id1",  10, 8,  10,
"A",    "id2",  6,  8,  4,
"B",    "id1",  15, 12, 18,
"B",    "id2",  1,  2,  4)

d %>% 
  group_by(Sample) %>%
  summarise(as.data.frame(cor(Expected, cbind(Observed_A, Observed_B)))) %>% 
  pivot_longer(-Sample, names_to = "Dataset", values_to="Correlation")
#> Warning in cor(Expected, cbind(Observed_A, Observed_B)): the standard deviation
#> is zero
#> # A tibble: 4 × 3
#>   Sample Dataset    Correlation
#>   <chr>  <chr>            <dbl>
#> 1 A      Observed_A          NA
#> 2 A      Observed_B           1
#> 3 B      Observed_A           1
#> 4 B      Observed_B           1

reprex 包(v2.0.1)于 2022 年 3 月 4 日创建< /sup>

How about this:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
d <- tibble::tribble(~Sample,   ~ID,    ~Expected,  ~Observed_A,    ~Observed_B,
"A",    "id1",  10, 8,  10,
"A",    "id2",  6,  8,  4,
"B",    "id1",  15, 12, 18,
"B",    "id2",  1,  2,  4)

d %>% 
  group_by(Sample) %>%
  summarise(as.data.frame(cor(Expected, cbind(Observed_A, Observed_B)))) %>% 
  pivot_longer(-Sample, names_to = "Dataset", values_to="Correlation")
#> Warning in cor(Expected, cbind(Observed_A, Observed_B)): the standard deviation
#> is zero
#> # A tibble: 4 × 3
#>   Sample Dataset    Correlation
#>   <chr>  <chr>            <dbl>
#> 1 A      Observed_A          NA
#> 2 A      Observed_B           1
#> 3 B      Observed_A           1
#> 4 B      Observed_B           1

Created on 2022-03-04 by the reprex package (v2.0.1)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文