按行不同列合并数据集
我需要按行合并数据集,但它们有不同的列。如何轻松地让 R 合并行、添加缺失的列并用 NA 填充缺失的列?目前我会这样做(多次合并非常耗时):
创建假数据...
x1<-LETTERS[1:3]
x2<-letters[1:3]
x3<-rnorm(3)
x4<-rnorm(3)
x5<-rnorm(3)
具有一些相似列、一些不同列的多个 data.frame 的示例...
data.frame(x1,x2,x3,x4,x5)
data.frame(x1,x3,x4,x5)
data.frame(x2,x3,x4,x5)
data.frame(x1,x2,x3,x4,x5)
我现在如何合并它...
DF<-data.frame(rbind(data.frame(x1,x2,x3,x4,x5),
data.frame(x1,x2,x3,x4,x5),
data.frame("x2"=rep(NA,3),data.frame(x1,x3,x4,x5)),
data.frame("x1"=rep(NA,3),data.frame(x2,x3,x4,x5))))
DF
编辑: 我尝试了建议的代码如下:
l <- list(data.frame(x1,x2,x3,x4,x5),
data.frame(x1,x3,x4,x5),
data.frame(x2,x3,x4,x5),
data.frame(x1,x2,x3,x4,x5))
merger <- function(l) lapply(2:length(l), function(x) merge(l[[x-1]], l[[x]], all=TRUE))
while (length(l) != 1) l<-merger(l)
l
产生:
[[1]]
x1 x3 x4 x5 x2
1 A 0.25492 0.30160 0.259287 a
2 B -0.25937 0.45936 -0.075415 b
3 C -0.53493 1.18316 0.627335 c
不是:
> DF
x1 x2 x3 x4 x5
1 A a 0.25492 0.30160 0.259287
2 B b -0.25937 0.45936 -0.075415
3 C c -0.53493 1.18316 0.627335
4 A a 0.25492 0.30160 0.259287
5 B b -0.25937 0.45936 -0.075415
6 C c -0.53493 1.18316 0.627335
7 A <NA> 0.25492 0.30160 0.259287
8 B <NA> -0.25937 0.45936 -0.075415
9 C <NA> -0.53493 1.18316 0.627335
10 <NA> a 0.25492 0.30160 0.259287
11 <NA> b -0.25937 0.45936 -0.075415
12 <NA> c -0.53493 1.18316 0.627335
编辑2: 抱歉延长我原来的帖子,但我的低代表不允许我回答我自己的问题。
结合 Jaron 和 daroczig 的回答就得到了我想要的结果。我不想将每个数据帧分配给一个对象,因此将它们组合为一个列表,然后使用 rbind fill 效果非常好(请参见下面的代码)
谢谢你们俩!
x1<-LETTERS[1:3]
x2<-letters[1:3]
x3<-rnorm(3)
x4<-rnorm(3)
x5<-rnorm(3)
DFlist<-list(data.frame(x1,x2,x3,x4,x5),
data.frame(x1,x3,x4,x5),
data.frame(x2,x3,x4,x5),
data.frame(x1,x2,x3,x4,x5))
rbind.fill(DFlist)
I have the need to merge data sets by row but they have differing columns. How can I easily get R to merge the rows, add missing columns and fill in the missing columns with NAs? Currently I would do it like this (very time consuming for multiple merges):
Creating fake data...
x1<-LETTERS[1:3]
x2<-letters[1:3]
x3<-rnorm(3)
x4<-rnorm(3)
x5<-rnorm(3)
Example of multiple data.frames with some similar columns, some different...
data.frame(x1,x2,x3,x4,x5)
data.frame(x1,x3,x4,x5)
data.frame(x2,x3,x4,x5)
data.frame(x1,x2,x3,x4,x5)
How I merge it now...
DF<-data.frame(rbind(data.frame(x1,x2,x3,x4,x5),
data.frame(x1,x2,x3,x4,x5),
data.frame("x2"=rep(NA,3),data.frame(x1,x3,x4,x5)),
data.frame("x1"=rep(NA,3),data.frame(x2,x3,x4,x5))))
DF
EDIT:
I tried the suggested code as follows:
l <- list(data.frame(x1,x2,x3,x4,x5),
data.frame(x1,x3,x4,x5),
data.frame(x2,x3,x4,x5),
data.frame(x1,x2,x3,x4,x5))
merger <- function(l) lapply(2:length(l), function(x) merge(l[[x-1]], l[[x]], all=TRUE))
while (length(l) != 1) l<-merger(l)
l
Which yields:
[[1]]
x1 x3 x4 x5 x2
1 A 0.25492 0.30160 0.259287 a
2 B -0.25937 0.45936 -0.075415 b
3 C -0.53493 1.18316 0.627335 c
Not:
> DF
x1 x2 x3 x4 x5
1 A a 0.25492 0.30160 0.259287
2 B b -0.25937 0.45936 -0.075415
3 C c -0.53493 1.18316 0.627335
4 A a 0.25492 0.30160 0.259287
5 B b -0.25937 0.45936 -0.075415
6 C c -0.53493 1.18316 0.627335
7 A <NA> 0.25492 0.30160 0.259287
8 B <NA> -0.25937 0.45936 -0.075415
9 C <NA> -0.53493 1.18316 0.627335
10 <NA> a 0.25492 0.30160 0.259287
11 <NA> b -0.25937 0.45936 -0.075415
12 <NA> c -0.53493 1.18316 0.627335
EDIT 2: Sorry to extend my original post but my low rep will not allow me to answer my own question.
Combining Jaron and daroczig's responses results in exactly what I want. I don't want to assign each data frame to an object, so combining them as a list and then using rbind fill works very nicely (see code below)
Thank you to both of you!
x1<-LETTERS[1:3]
x2<-letters[1:3]
x3<-rnorm(3)
x4<-rnorm(3)
x5<-rnorm(3)
DFlist<-list(data.frame(x1,x2,x3,x4,x5),
data.frame(x1,x3,x4,x5),
data.frame(x2,x3,x4,x5),
data.frame(x1,x2,x3,x4,x5))
rbind.fill(DFlist)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
在我明白你在寻找什么之前,我不得不读你的问题好几次,但也许你想要来自
plyr
的rbind.fill
:I had to read your question quite a few times before I understood what you were looking for, but maybe you want
rbind.fill
fromplyr
:使用带有 fill = TRUE 选项的 data.table::rbindlist :
Using data.table::rbindlist with fill = TRUE option:
假设您的数据框位于一个不错的列表中:
抓住第一个数据框,然后(按照@joran的建议)
合并
所有其余数据,例如。清晰的循环:查看
r
:我不喜欢那个循环,所以写了一些递归的东西:
Let us say you have your data frames in a nice list:
Grab the first one and (as @joran suggested)
merge
all the rest to that with eg. a lucid loop:Check out
r
:I didn't like that loop, so wrote some recursive stuff: