在 R 中合并具有缺失值的数据框

发布于 2024-11-04 10:55:52 字数 3953 浏览 1 评论 0原文

获取数据框的代码：

rat_all = structure(list(frequency = c(37L, 31L, 14L, 11L, 2L, 3L), isoforms = 8:13,      
    type = structure(c("rat_all", "rat_all", "rat_all", "rat_all",              
    "rat_all", "rat_all"), .Dim = c(6L, 1L))), .Names = c("frequency",          
"isoforms", "type"), row.names = 8:13, class = "data.frame")

rat_ensembl = structure(list(frequency = c(17L, 8L, 20L), isoforms = 8:10,                    
    type = structure(c("rat_ensembl", "rat_ensembl", "rat_ensembl"              
    ), .Dim = c(3L, 1L))), .Names = c("frequency", "isoforms",                  
"type"), row.names = 8:10, class = "data.frame")

我有两个数据框：

  frequency isoforms        type                                               
8         17        8 rat_ensembl                                               
9          8        9 rat_ensembl                                               
10        20       10 rat_ensembl

我

   frequency isoforms    type                                                   
8         37        8 rat_all                                                   
9         31        9 rat_all                                                   
10        14       10 rat_all                                                   
11        11       11 rat_all                                                   
12         2       12 rat_all                                                   
13         3       13 rat_all

想将它们组合成一个数据框，但也包括缺失的数据框 isoforms 条目出现在 rat_all 数据框中，但不在 rat_ensembl 中数据框。所以我希望输出是一个组合数据框，就像我绑定一样两个数据帧，但增强了：

11         0       11 rat_ensembl
12         0       12 rat_ensembl
13         0       13 rat_ensembl

我以为我可以通过合并来做到这一点，但最终我得到了一个巨大的混乱，我必须将其展开，我最终可以将其调整为正确的格式，但这不是一个好的解决方案，如果我想同时对四到五种不同的“类型”进行此操作。我缺少什么？谢谢！

需要明确的是，我希望获得一个最终的数据框，如下所示：

      frequency isoforms        type                                               
1         17        8 rat_ensembl                                               
2          8        9 rat_ensembl                                               
3         20       10 rat_ensembl                                                   
4         37        8 rat_all                                                   
5         31        9 rat_all                                                   
6         14       10 rat_all                                                   
7         11       11 rat_all                                                   
8          2       12 rat_all                                                   
9          3       13 rat_all   
10         0       11 rat_ensembl
11         0       12 rat_ensembl
12         0       13 rat_ensembl

如果我使用的话，我可以让它做我想做的事情：

z = merge(rat_ensembl, rat_all, by.x="isoforms", by.y="isoforms", all.y=TRUE)
   isoforms frequency.x      type.x frequency.y  type.y                         
7         7          44 rat_ensembl          69 rat_all                         
8         8          17 rat_ensembl          37 rat_all                         
9         9           8 rat_ensembl          31 rat_all                         
10       10          20 rat_ensembl          14 rat_all                         
11       11          NA        <NA>          11 rat_all                         
12       12          NA        <NA>           2 rat_all                         
13       13          NA        <NA>           3 rat_all                         
14       14          NA        <NA>           1 rat_all

然后，理论上我可以选择 isoforms, <代码>频率.x，<代码>类型.x列和修复它们，使它们对于 rat_ensembl 和 rat_all 都是正确的，然后 rbind 这些数据帧放在一起，但似乎应该有一些东西可以直接处理它。

原文

Code to get the data frames:

rat_all = structure(list(frequency = c(37L, 31L, 14L, 11L, 2L, 3L), isoforms = 8:13,      
    type = structure(c("rat_all", "rat_all", "rat_all", "rat_all",              
    "rat_all", "rat_all"), .Dim = c(6L, 1L))), .Names = c("frequency",          
"isoforms", "type"), row.names = 8:13, class = "data.frame")

rat_ensembl = structure(list(frequency = c(17L, 8L, 20L), isoforms = 8:10,                    
    type = structure(c("rat_ensembl", "rat_ensembl", "rat_ensembl"              
    ), .Dim = c(3L, 1L))), .Names = c("frequency", "isoforms",                  
"type"), row.names = 8:10, class = "data.frame")

I have two data frames:

  frequency isoforms        type                                               
8         17        8 rat_ensembl                                               
9          8        9 rat_ensembl                                               
10        20       10 rat_ensembl

and

   frequency isoforms    type                                                   
8         37        8 rat_all                                                   
9         31        9 rat_all                                                   
10        14       10 rat_all                                                   
11        11       11 rat_all                                                   
12         2       12 rat_all                                                   
13         3       13 rat_all

I'd like to combine these into one data frame, but also to include the missing
isoforms entries that appear in the rat_all data frame but not the rat_ensembl
data frame. So I'd like the output to be a combined data frame as if I rbinded
the two data frames, but augmented with:

11         0       11 rat_ensembl
12         0       12 rat_ensembl
13         0       13 rat_ensembl

I thought I could do it with merge but I wind up getting a huge mess that I have to unwind that I can eventually massage into the right format but it is not a good solution if
I wanted to do this for four or five different 'types' at once. What am I missing? Thanks!

To be clear I'm looking to get a final data frame that looks like:

      frequency isoforms        type                                               
1         17        8 rat_ensembl                                               
2          8        9 rat_ensembl                                               
3         20       10 rat_ensembl                                                   
4         37        8 rat_all                                                   
5         31        9 rat_all                                                   
6         14       10 rat_all                                                   
7         11       11 rat_all                                                   
8          2       12 rat_all                                                   
9          3       13 rat_all   
10         0       11 rat_ensembl
11         0       12 rat_ensembl
12         0       13 rat_ensembl

I can kind of get it to do what I want if I use:

z = merge(rat_ensembl, rat_all, by.x="isoforms", by.y="isoforms", all.y=TRUE)
   isoforms frequency.x      type.x frequency.y  type.y                         
7         7          44 rat_ensembl          69 rat_all                         
8         8          17 rat_ensembl          37 rat_all                         
9         9           8 rat_ensembl          31 rat_all                         
10       10          20 rat_ensembl          14 rat_all                         
11       11          NA        <NA>          11 rat_all                         
12       12          NA        <NA>           2 rat_all                         
13       13          NA        <NA>           3 rat_all                         
14       14          NA        <NA>           1 rat_all

Then, theoretically I could select out the isoforms, frequency.x, type.x columns and
fix them so they are correct for each of rat_ensembl and rat_all and then rbind those
data frames together but it seems like there should be something to just handle it directly.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

瀟灑尐姊 2024-11-11 10:55:52

也许你想要这样的东西

z <- merge(rat_ensembl, rat_all, all = TRUE)

iso_diff <- setdiff(rat_all$isoforms, rat_ensembl$isoforms)

augmented <- data.frame(frequency = 0, isoforms = iso_diff, type = "rat_ensembl", stringsAsFactors= FALSE)

df_all <- rbind(z, augmented)

希望有帮助。

maybe you want something like this

z <- merge(rat_ensembl, rat_all, all = TRUE)

iso_diff <- setdiff(rat_all$isoforms, rat_ensembl$isoforms)

augmented <- data.frame(frequency = 0, isoforms = iso_diff, type = "rat_ensembl", stringsAsFactors= FALSE)

df_all <- rbind(z, augmented)

Hope that helps.

回复收藏 0 原文

~没有更多了~