对于所有这些数据帧的所有可能组合,查找多个但不是所有可用数据帧的公共行
我有多个具有以下格式的数据框:
Gene Entrez.Id Dataset Correlation
1 MTHFD2 10797 CRISPR (DepMap 22Q1 Public+Score, Chronos) 0.3328479
2 SLC25A32 81034 CRISPR (DepMap 22Q1 Public+Score, Chronos) 0.3111028
3 MTHFD1L 25902 CRISPR (DepMap 22Q1 Public+Score, Chronos) 0.2710356
4 DTX3 196403 CRISPR (DepMap 22Q1 Public+Score, Chronos) 0.2672314
我的目标是在 Gene
列中查找所有数据框共有的元素,为此我使用了以下命令:
df.join <- join_all(list(df1,df2,df3,df4,df5), by = "Gene", type = "inner")
但实际上没有 Gene
元素对所有数据框都是通用的,因此 df.join
为空。 现在我想知道 Gene
列中是否有大多数数据帧所共有的元素,但不是全部,比如说 5 个中的 4 个。有没有一种方法可以做到这一点,而无需手动构建行数据框所有可能组合的代码?
I have multiple data frames with the following format:
Gene Entrez.Id Dataset Correlation
1 MTHFD2 10797 CRISPR (DepMap 22Q1 Public+Score, Chronos) 0.3328479
2 SLC25A32 81034 CRISPR (DepMap 22Q1 Public+Score, Chronos) 0.3111028
3 MTHFD1L 25902 CRISPR (DepMap 22Q1 Public+Score, Chronos) 0.2710356
4 DTX3 196403 CRISPR (DepMap 22Q1 Public+Score, Chronos) 0.2672314
My aim was to find elements in the Gene
column that were common to all data frames, for which I used the following command:
df.join <- join_all(list(df1,df2,df3,df4,df5), by = "Gene", type = "inner")
But there are actually no Gene
elements that are common to all data frames, so df.join
is empty.
Now I want to know whether there are elements in the Gene
column that are common to most data frame but not all, let's say 4 out of 5. Is there a way to do this without manually constructing lines of code for all the possible combinations of data frames?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
涉及 dplyr 和 purrr 的一个选项可能是:
在这种方法中,识别出现在所需数量的数据集中(此处 n = 5)的 ID。然后,在第二步中,这些ID被过滤掉并连接在一起。
如果还需要有关数据集的信息:
One option involving
dplyr
andpurrr
could be:In this approach, the IDs that are present in the required number of datasets (here n = 5) are identified. Then, in the second step, these IDs are filtered out and joined together.
If also the information on datasets is needed: