对于所有这些数据帧的所有可能组合,查找多个但不是所有可用数据帧的公共行

发布于 2025-01-17 20:17:54 字数 774 浏览 3 评论 0原文

我有多个具有以下格式的数据框:

        Gene Entrez.Id                                    Dataset Correlation
1     MTHFD2     10797 CRISPR (DepMap 22Q1 Public+Score, Chronos)   0.3328479
2   SLC25A32     81034 CRISPR (DepMap 22Q1 Public+Score, Chronos)   0.3111028
3    MTHFD1L     25902 CRISPR (DepMap 22Q1 Public+Score, Chronos)   0.2710356
4       DTX3    196403 CRISPR (DepMap 22Q1 Public+Score, Chronos)   0.2672314

我的目标是在 Gene 列中查找所有数据框共有的元素,为此我使用了以下命令:

df.join <- join_all(list(df1,df2,df3,df4,df5), by = "Gene", type = "inner")

但实际上没有 Gene 元素对所有数据框都是通用的,因此 df.join 为空。 现在我想知道 Gene 列中是否有大多数数据帧所共有的元素,但不是全部,比如说 5 个中的 4 个。有没有一种方法可以做到这一点,而无需手动构建行数据框所有可能组合的代码?

I have multiple data frames with the following format:

        Gene Entrez.Id                                    Dataset Correlation
1     MTHFD2     10797 CRISPR (DepMap 22Q1 Public+Score, Chronos)   0.3328479
2   SLC25A32     81034 CRISPR (DepMap 22Q1 Public+Score, Chronos)   0.3111028
3    MTHFD1L     25902 CRISPR (DepMap 22Q1 Public+Score, Chronos)   0.2710356
4       DTX3    196403 CRISPR (DepMap 22Q1 Public+Score, Chronos)   0.2672314

My aim was to find elements in the Gene column that were common to all data frames, for which I used the following command:

df.join <- join_all(list(df1,df2,df3,df4,df5), by = "Gene", type = "inner")

But there are actually no Gene elements that are common to all data frames, so df.join is empty.
Now I want to know whether there are elements in the Gene column that are common to most data frame but not all, let's say 4 out of 5. Is there a way to do this without manually constructing lines of code for all the possible combinations of data frames?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

任谁 2025-01-24 20:17:54

涉及 dplyr 和 purrr 的一个选项可能是:

ids_to_join <- mget(ls(pattern = "df")) %>%
    map_dfr(~ select(., "Gene"), .id = "dataset") %>%
    group_by(Gene) %>%
    summarise(n = n_distinct(dataset)) %>%
    ungroup() %>%
    filter(n == 5) %>% #The number corresponds to the required number of datasets
    pull(Gene)

mget(ls(pattern = "df")) %>%
    map(~ filter(., Gene %in% ids_to_join)) %>%
    reduce(inner_join, 
           by = "Gene")

在这种方法中,识别出现在所需数量的数据集中(此处 n = 5)的 ID。然后,在第二步中,这些ID被过滤掉并连接在一起。

如果还需要有关数据集的信息:

ids_to_join <- mget(ls(pattern = "df")) %>%
    map_dfr(~ select(., "Gene"), .id = "dataset") %>%
    group_by(Gene) %>%
    summarise(n = n_distinct(dataset),
              dataset = paste(dataset, collapse = ", ")) %>%
    ungroup() %>%
    filter(n == 5) %>%
    select(-n)

mget(ls(pattern = "df")) %>%
    map(~ filter(., Gene %in% ids_to_join[["Gene"]])) %>%
    reduce(inner_join, 
           by = "Gene") %>%
    left_join(ids_to_join,
              by = "Gene")

One option involving dplyr and purrr could be:

ids_to_join <- mget(ls(pattern = "df")) %>%
    map_dfr(~ select(., "Gene"), .id = "dataset") %>%
    group_by(Gene) %>%
    summarise(n = n_distinct(dataset)) %>%
    ungroup() %>%
    filter(n == 5) %>% #The number corresponds to the required number of datasets
    pull(Gene)

mget(ls(pattern = "df")) %>%
    map(~ filter(., Gene %in% ids_to_join)) %>%
    reduce(inner_join, 
           by = "Gene")

In this approach, the IDs that are present in the required number of datasets (here n = 5) are identified. Then, in the second step, these IDs are filtered out and joined together.

If also the information on datasets is needed:

ids_to_join <- mget(ls(pattern = "df")) %>%
    map_dfr(~ select(., "Gene"), .id = "dataset") %>%
    group_by(Gene) %>%
    summarise(n = n_distinct(dataset),
              dataset = paste(dataset, collapse = ", ")) %>%
    ungroup() %>%
    filter(n == 5) %>%
    select(-n)

mget(ls(pattern = "df")) %>%
    map(~ filter(., Gene %in% ids_to_join[["Gene"]])) %>%
    reduce(inner_join, 
           by = "Gene") %>%
    left_join(ids_to_join,
              by = "Gene")
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文