在 R 中合并具有缺失值的数据框

发布于 2024-11-04 10:55:52 字数 3953 浏览 1 评论 0原文

获取数据框的代码:

rat_all = structure(list(frequency = c(37L, 31L, 14L, 11L, 2L, 3L), isoforms = 8:13,      
    type = structure(c("rat_all", "rat_all", "rat_all", "rat_all",              
    "rat_all", "rat_all"), .Dim = c(6L, 1L))), .Names = c("frequency",          
"isoforms", "type"), row.names = 8:13, class = "data.frame")

rat_ensembl = structure(list(frequency = c(17L, 8L, 20L), isoforms = 8:10,                    
    type = structure(c("rat_ensembl", "rat_ensembl", "rat_ensembl"              
    ), .Dim = c(3L, 1L))), .Names = c("frequency", "isoforms",                  
"type"), row.names = 8:10, class = "data.frame") 

我有两个数据框:

  frequency isoforms        type                                               
8         17        8 rat_ensembl                                               
9          8        9 rat_ensembl                                               
10        20       10 rat_ensembl  

   frequency isoforms    type                                                   
8         37        8 rat_all                                                   
9         31        9 rat_all                                                   
10        14       10 rat_all                                                   
11        11       11 rat_all                                                   
12         2       12 rat_all                                                   
13         3       13 rat_all   

想将它们组合成一个数据框,但也包括缺失的数据框 isoforms 条目出现在 rat_all 数据框中,但不在 rat_ensembl 中 数据框。所以我希望输出是一个组合数据框,就像我绑定一样 两个数据帧,但增强了:

11         0       11 rat_ensembl
12         0       12 rat_ensembl
13         0       13 rat_ensembl

我以为我可以通过合并来做到这一点,但最终我得到了一个巨大的混乱,我必须将其展开,我最终可以将其调整为正确的格式,但这不是一个好的解决方案,如果 我想同时对四到五种不同的“类型”进行此操作。我缺少什么?谢谢!

需要明确的是,我希望获得一个最终的数据框,如下所示:

      frequency isoforms        type                                               
1         17        8 rat_ensembl                                               
2          8        9 rat_ensembl                                               
3         20       10 rat_ensembl                                                   
4         37        8 rat_all                                                   
5         31        9 rat_all                                                   
6         14       10 rat_all                                                   
7         11       11 rat_all                                                   
8          2       12 rat_all                                                   
9          3       13 rat_all   
10         0       11 rat_ensembl
11         0       12 rat_ensembl
12         0       13 rat_ensembl

如果我使用的话,我可以让它做我想做的事情:

z = merge(rat_ensembl, rat_all, by.x="isoforms", by.y="isoforms", all.y=TRUE)
   isoforms frequency.x      type.x frequency.y  type.y                         
7         7          44 rat_ensembl          69 rat_all                         
8         8          17 rat_ensembl          37 rat_all                         
9         9           8 rat_ensembl          31 rat_all                         
10       10          20 rat_ensembl          14 rat_all                         
11       11          NA        <NA>          11 rat_all                         
12       12          NA        <NA>           2 rat_all                         
13       13          NA        <NA>           3 rat_all                         
14       14          NA        <NA>           1 rat_all            

然后,理论上我可以选择 isoforms, <代码>频率.x,<代码>类型.x列和 修复它们,使它们对于 rat_ensemblrat_all 都是正确的,然后 rbind 这些 数据帧放在一起,但似乎应该有一些东西可以直接处理它。

Code to get the data frames:

rat_all = structure(list(frequency = c(37L, 31L, 14L, 11L, 2L, 3L), isoforms = 8:13,      
    type = structure(c("rat_all", "rat_all", "rat_all", "rat_all",              
    "rat_all", "rat_all"), .Dim = c(6L, 1L))), .Names = c("frequency",          
"isoforms", "type"), row.names = 8:13, class = "data.frame")

rat_ensembl = structure(list(frequency = c(17L, 8L, 20L), isoforms = 8:10,                    
    type = structure(c("rat_ensembl", "rat_ensembl", "rat_ensembl"              
    ), .Dim = c(3L, 1L))), .Names = c("frequency", "isoforms",                  
"type"), row.names = 8:10, class = "data.frame") 

I have two data frames:

  frequency isoforms        type                                               
8         17        8 rat_ensembl                                               
9          8        9 rat_ensembl                                               
10        20       10 rat_ensembl  

and

   frequency isoforms    type                                                   
8         37        8 rat_all                                                   
9         31        9 rat_all                                                   
10        14       10 rat_all                                                   
11        11       11 rat_all                                                   
12         2       12 rat_all                                                   
13         3       13 rat_all   

I'd like to combine these into one data frame, but also to include the missing
isoforms entries that appear in the rat_all data frame but not the rat_ensembl
data frame. So I'd like the output to be a combined data frame as if I rbinded
the two data frames, but augmented with:

11         0       11 rat_ensembl
12         0       12 rat_ensembl
13         0       13 rat_ensembl

I thought I could do it with merge but I wind up getting a huge mess that I have to unwind that I can eventually massage into the right format but it is not a good solution if
I wanted to do this for four or five different 'types' at once. What am I missing? Thanks!

To be clear I'm looking to get a final data frame that looks like:

      frequency isoforms        type                                               
1         17        8 rat_ensembl                                               
2          8        9 rat_ensembl                                               
3         20       10 rat_ensembl                                                   
4         37        8 rat_all                                                   
5         31        9 rat_all                                                   
6         14       10 rat_all                                                   
7         11       11 rat_all                                                   
8          2       12 rat_all                                                   
9          3       13 rat_all   
10         0       11 rat_ensembl
11         0       12 rat_ensembl
12         0       13 rat_ensembl

I can kind of get it to do what I want if I use:

z = merge(rat_ensembl, rat_all, by.x="isoforms", by.y="isoforms", all.y=TRUE)
   isoforms frequency.x      type.x frequency.y  type.y                         
7         7          44 rat_ensembl          69 rat_all                         
8         8          17 rat_ensembl          37 rat_all                         
9         9           8 rat_ensembl          31 rat_all                         
10       10          20 rat_ensembl          14 rat_all                         
11       11          NA        <NA>          11 rat_all                         
12       12          NA        <NA>           2 rat_all                         
13       13          NA        <NA>           3 rat_all                         
14       14          NA        <NA>           1 rat_all            

Then, theoretically I could select out the isoforms, frequency.x, type.x columns and
fix them so they are correct for each of rat_ensembl and rat_all and then rbind those
data frames together but it seems like there should be something to just handle it directly.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

瀟灑尐姊 2024-11-11 10:55:52

也许你想要这样的东西

z <- merge(rat_ensembl, rat_all, all = TRUE)

iso_diff <- setdiff(rat_all$isoforms, rat_ensembl$isoforms)

augmented <- data.frame(frequency = 0, isoforms = iso_diff, type = "rat_ensembl", stringsAsFactors= FALSE)

df_all <- rbind(z, augmented)

希望有帮助。

maybe you want something like this

z <- merge(rat_ensembl, rat_all, all = TRUE)

iso_diff <- setdiff(rat_all$isoforms, rat_ensembl$isoforms)

augmented <- data.frame(frequency = 0, isoforms = iso_diff, type = "rat_ensembl", stringsAsFactors= FALSE)

df_all <- rbind(z, augmented)

Hope that helps.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文