组合两个不同长度的数据帧

发布于 2024-11-28 07:15:43 字数 1468 浏览 1 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

千寻… 2024-12-05 07:15:44

希望这对你有用!

您可以使用library(qpcR)来组合两个大小不等的矩阵。

resultant_matrix <- qpcR:::cbind.na(matrix1, matrix2)

注意:- 所得矩阵的大小为矩阵 2。

Hope this will work for you!

You can use library(qpcR) for combining two matrix with unequal size.

resultant_matrix <- qpcR:::cbind.na(matrix1, matrix2)

NOTE:- The resultant matrix will be of size of matrix2.

青芜 2024-12-05 07:15:44

只是我的2分钱。此代码将两个矩阵或 data.frame 合并为一个。如果一种数据结构的行数较少,则缺失的行将添加 NA 值。

combine.df <- function(x, y) {
    rows.x <- nrow(x)
    rows.y <- nrow(y)
    if (rows.x > rows.y) {
        diff <- rows.x - rows.y
        df.na <- matrix(NA, diff, ncol(y))
        colnames(df.na) <- colnames(y)
        cbind(x, rbind(y, df.na))
    } else {
        diff <- rows.y - rows.x
        df.na <- matrix(NA, diff, ncol(x))
        colnames(df.na) <- colnames(x)
        cbind(rbind(x, df.na), y)
    }
}

df1 <- data.frame(1:10, row.names = 1:10)
df2 <- data.frame(1:5, row.names = 10:14)
combine.df(df1, df2)

Just my 2 cents. This code combines two matrices or data.frames into one. If one data structure have lower number of rows then missing rows will be added with NA values.

combine.df <- function(x, y) {
    rows.x <- nrow(x)
    rows.y <- nrow(y)
    if (rows.x > rows.y) {
        diff <- rows.x - rows.y
        df.na <- matrix(NA, diff, ncol(y))
        colnames(df.na) <- colnames(y)
        cbind(x, rbind(y, df.na))
    } else {
        diff <- rows.y - rows.x
        df.na <- matrix(NA, diff, ncol(x))
        colnames(df.na) <- colnames(x)
        cbind(rbind(x, df.na), y)
    }
}

df1 <- data.frame(1:10, row.names = 1:10)
df2 <- data.frame(1:5, row.names = 10:14)
combine.df(df1, df2)
天生の放荡 2024-12-05 07:15:44

我实际上并没有遇到错误。

a <- as.data.frame(matrix(c(sample(letters,50, replace=T),runif(100)), nrow=50))
b <- sample(letters,10, replace=T)
c <- cbind(a,b)

我使用了字母,以防连接所有数字具有不同的功能(但事实并非如此)。您的“第一个数据框”实际上只是一个向量”,只是在第四列中重复了 5 次...

但是专家对这个问题的所有评论仍然相关:)

I don't actually get an error with this.

a <- as.data.frame(matrix(c(sample(letters,50, replace=T),runif(100)), nrow=50))
b <- sample(letters,10, replace=T)
c <- cbind(a,b)

I used letters incase joining all numerics had different functionality (which it didn't). Your 'first data frame', which is actually just a vector', is just repeated 5 times in that 4th column...

But all the comments from the gurus to the question are still relevant :)

少跟Wǒ拽 2024-12-05 07:15:44

我想我已经想出了一个相当短的解决方案..希望它对某人有帮助。

cbind.na<-function(df1, df2){

  #Collect all unique rownames
  total.rownames<-union(x = rownames(x = df1),y = rownames(x=df2))

  #Create a new dataframe with rownames
  df<-data.frame(row.names = total.rownames)

  #Get absent rownames for both of the dataframe
  absent.names.1<-setdiff(x = rownames(df1),y = rownames(df))
  absent.names.2<-setdiff(x = rownames(df2),y = rownames(df))

  #Fill absents with NAs
  df1.fixed<-data.frame(row.names = absent.names.1,matrix(data = NA,nrow = length(absent.names.1),ncol=ncol(df1)))
  colnames(df1.fixed)<-colnames(df1)
  df1<-rbind(df1,df1.fixed)

  df2.fixed<-data.frame(row.names = absent.names.2,matrix(data = NA,nrow = length(absent.names.2),ncol=ncol(df2)))
  colnames(df2.fixed)<-colnames(df2)
  df2<-rbind(df2,df2.fixed)

  #Finally cbind into new dataframe
  df<-cbind(df,df1[rownames(df),],df2[rownames(df),])
  return(df)

}

I think I have come up with a quite shorter solution.. Hope it helps someone.

cbind.na<-function(df1, df2){

  #Collect all unique rownames
  total.rownames<-union(x = rownames(x = df1),y = rownames(x=df2))

  #Create a new dataframe with rownames
  df<-data.frame(row.names = total.rownames)

  #Get absent rownames for both of the dataframe
  absent.names.1<-setdiff(x = rownames(df1),y = rownames(df))
  absent.names.2<-setdiff(x = rownames(df2),y = rownames(df))

  #Fill absents with NAs
  df1.fixed<-data.frame(row.names = absent.names.1,matrix(data = NA,nrow = length(absent.names.1),ncol=ncol(df1)))
  colnames(df1.fixed)<-colnames(df1)
  df1<-rbind(df1,df1.fixed)

  df2.fixed<-data.frame(row.names = absent.names.2,matrix(data = NA,nrow = length(absent.names.2),ncol=ncol(df2)))
  colnames(df2.fixed)<-colnames(df2)
  df2<-rbind(df2,df2.fixed)

  #Finally cbind into new dataframe
  df<-cbind(df,df1[rownames(df),],df2[rownames(df),])
  return(df)

}
灼痛 2024-12-05 07:15:44

我有类似的问题,我匹配两个数据集的特定列中的条目,并且仅在匹配时才进行 cbind 。
对于两个数据集,data1 & data2,在比较两者的第一列后,我在 data1 中从 data2 添加一列。

for(i in 1:nrow(data1){
  for( j in 1:nrow(data2){
    if (data1[i,1]==data2[j,1]) data1[i,3]<- data2[j,2]
  }
}

i had similar problem, i matched the entries in a particular column of two data sets and cbind only if it matched.
For two data sets, data1 & data2, i am adding a column in data1 from data2 after comparing first column of both.

for(i in 1:nrow(data1){
  for( j in 1:nrow(data2){
    if (data1[i,1]==data2[j,1]) data1[i,3]<- data2[j,2]
  }
}
玩套路吗 2024-12-05 07:15:43

plyr 包中,有一个函数 rbind.fill ,它将合并 data.frames 并为空单元引入 NA

library(plyr)
combined <- rbind.fill(mtcars[c("mpg", "wt")], mtcars[c("wt", "cyl")])
combined[25:40, ]

    mpg    wt cyl
25 19.2 3.845  NA
26 27.3 1.935  NA
27 26.0 2.140  NA
28 30.4 1.513  NA
29 15.8 3.170  NA
30 19.7 2.770  NA
31 15.0 3.570  NA
32 21.4 2.780  NA
33   NA 2.620   6
34   NA 2.875   6
35   NA 2.320   4

In the plyr package there is a function rbind.fill that will merge data.frames and introduce NA for empty cells:

library(plyr)
combined <- rbind.fill(mtcars[c("mpg", "wt")], mtcars[c("wt", "cyl")])
combined[25:40, ]

    mpg    wt cyl
25 19.2 3.845  NA
26 27.3 1.935  NA
27 26.0 2.140  NA
28 30.4 1.513  NA
29 15.8 3.170  NA
30 19.7 2.770  NA
31 15.0 3.570  NA
32 21.4 2.780  NA
33   NA 2.620   6
34   NA 2.875   6
35   NA 2.320   4
孤星 2024-12-05 07:15:43

鉴于后续评论,我根本不清楚OP实际上在追求什么。他们可能实际上正在寻找一种将数据写入文件的方法。

但假设我们确实在寻找一种cbind不同长度的多个数据帧的方法。

cbind 最终会调用 data.frame,其帮助文件显示:

传递给 data.frame 的对象应该具有相同的行数,但是
由 I 保护的原子向量、因子和特征向量将是
如有必要,可循环使用多次(包括从 R
2.9.0,列表参数的元素)。

因此,在OP的实际示例中,不应该出现错误,因为R应该回收长度为50的较短向量。事实上,当我运行以下命令时:

set.seed(1)
a <- runif(50)
b <- 1:50
c <- rep(LETTERS[1:5],length.out = 50)
dat1 <- data.frame(a,b,c)
dat2 <- data.frame(d = runif(10),e = runif(10))
cbind(dat1,dat2)

我没有收到任何错误,并且较短的数据帧按预期回收。但是,当我运行此命令时:

set.seed(1)
a <- runif(50)
b <- 1:50
c <- rep(LETTERS[1:5],length.out = 50)
dat1 <- data.frame(a,b,c)
dat2 <- data.frame(d = runif(9), e = runif(9))
cbind(dat1,dat2)

我收到以下错误:

Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 50, 9

但 R 的奇妙之处在于,您几乎可以让它执行您想要的任何操作,即使您不应该这样做。例如,这是一个简单的函数,它将cbind长度不均匀的数据帧并自动用NA填充较短的数据帧:

cbindPad <- function(...){
args <- list(...)
n <- sapply(args,nrow)
mx <- max(n)
pad <- function(x, mx){
    if (nrow(x) < mx){
        nms <- colnames(x)
        padTemp <- matrix(NA, mx - nrow(x), ncol(x))
        colnames(padTemp) <- nms
        if (ncol(x)==0) {
          return(padTemp)
        } else {
        return(rbind(x,padTemp))
          }
    }
    else{
        return(x)
    }
}
rs <- lapply(args,pad,mx)
return(do.call(cbind,rs))
}

可以这样使用:

set.seed(1)
a <- runif(50)
b <- 1:50
c <- rep(LETTERS[1:5],length.out = 50)
dat1 <- data.frame(a,b,c)
dat2 <- data.frame(d = runif(10),e = runif(10))
dat3 <- data.frame(d = runif(9), e = runif(9))
cbindPad(dat1,dat2,dat3)

我不保证该功能在所有情况下都有效;它仅作为示例。

编辑

如果主要目标是创建 csv 或文本文件,您只需更改函数即可使用 "" 而不是 NA 进行填充code> 然后执行如下操作:

dat <- cbindPad(dat1,dat2,dat3)
rs <- as.data.frame(apply(dat,1,function(x){paste(as.character(x),collapse=",")}))

然后在 rs 上使用 write.table

It's not clear to me at all what the OP is actually after, given the follow-up comments. It's possible they are actually looking for a way to write the data to file.

But let's assume that we're really after a way to cbind multiple data frames of differing lengths.

cbind will eventually call data.frame, whose help files says:

Objects passed to data.frame should have the same number of rows, but
atomic vectors, factors and character vectors protected by I will be
recycled a whole number of times if necessary (including as from R
2.9.0, elements of list arguments).

so in the OP's actual example, there shouldn't be an error, as R ought to recycle the shorter vectors to be of length 50. Indeed, when I run the following:

set.seed(1)
a <- runif(50)
b <- 1:50
c <- rep(LETTERS[1:5],length.out = 50)
dat1 <- data.frame(a,b,c)
dat2 <- data.frame(d = runif(10),e = runif(10))
cbind(dat1,dat2)

I get no errors and the shorter data frame is recycled as expected. However, when I run this:

set.seed(1)
a <- runif(50)
b <- 1:50
c <- rep(LETTERS[1:5],length.out = 50)
dat1 <- data.frame(a,b,c)
dat2 <- data.frame(d = runif(9), e = runif(9))
cbind(dat1,dat2)

I get the following error:

Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 50, 9

But the wonderful thing about R is that you can make it do almost anything you want, even if you shouldn't. For example, here's a simple function that will cbind data frames of uneven length and automatically pad the shorter ones with NAs:

cbindPad <- function(...){
args <- list(...)
n <- sapply(args,nrow)
mx <- max(n)
pad <- function(x, mx){
    if (nrow(x) < mx){
        nms <- colnames(x)
        padTemp <- matrix(NA, mx - nrow(x), ncol(x))
        colnames(padTemp) <- nms
        if (ncol(x)==0) {
          return(padTemp)
        } else {
        return(rbind(x,padTemp))
          }
    }
    else{
        return(x)
    }
}
rs <- lapply(args,pad,mx)
return(do.call(cbind,rs))
}

which can be used like this:

set.seed(1)
a <- runif(50)
b <- 1:50
c <- rep(LETTERS[1:5],length.out = 50)
dat1 <- data.frame(a,b,c)
dat2 <- data.frame(d = runif(10),e = runif(10))
dat3 <- data.frame(d = runif(9), e = runif(9))
cbindPad(dat1,dat2,dat3)

I make no guarantees that this function works in all cases; it is meant as an example only.

EDIT

If the primary goal is to create a csv or text file, all you need to do it alter the function to pad using "" rather than NA and then do something like this:

dat <- cbindPad(dat1,dat2,dat3)
rs <- as.data.frame(apply(dat,1,function(x){paste(as.character(x),collapse=",")}))

and then use write.table on rs.

当爱已成负担 2024-12-05 07:15:43

参考Andrie的回答,建议使用plyr::rbind.fill()
t() 结合使用,您将获得类似于 cbind.fill() (不是 plyr 的一部分)的东西,它将使用以下命令构建您的数据框:考虑相同的案件编号。

Refering to Andrie's answer, suggesting to use plyr::rbind.fill():
Combined with t() you have something like cbind.fill() (which is not part of plyr) that will construct your data frame with consideration of identical case numbers.

莳間冲淡了誓言ζ 2024-12-05 07:15:43

我的想法是获取所有 data.frame 的最大行数,然后根据需要将空矩阵附加到每个 data.frame 。该方法不需要额外的包,只使用base。代码如下:

list.df <- list(data.frame(a = 1:10), data.frame(a = 1:5), data.frame(a = 1:3))

max.rows <- max(unlist(lapply(list.df, nrow), use.names = F))

list.df <- lapply(list.df, function(x) {
    na.count <- max.rows - nrow(x)
    if (na.count > 0L) {
        na.dm <- matrix(NA, na.count, ncol(x))
        colnames(na.dm) <- colnames(x)
        rbind(x, na.dm)
    } else {
        x
    }
})

do.call(cbind, list.df)

#     a  a  a
# 1   1  1  1
# 2   2  2  2
# 3   3  3  3
# 4   4  4 NA
# 5   5  5 NA
# 6   6 NA NA
# 7   7 NA NA
# 8   8 NA NA
# 9   9 NA NA
# 10 10 NA NA

My idea is to get max of rows count of all data.frames and next append empty matrix to every data.frame if need. This method doesn't require additional packages, only base is used. Code looks following:

list.df <- list(data.frame(a = 1:10), data.frame(a = 1:5), data.frame(a = 1:3))

max.rows <- max(unlist(lapply(list.df, nrow), use.names = F))

list.df <- lapply(list.df, function(x) {
    na.count <- max.rows - nrow(x)
    if (na.count > 0L) {
        na.dm <- matrix(NA, na.count, ncol(x))
        colnames(na.dm) <- colnames(x)
        rbind(x, na.dm)
    } else {
        x
    }
})

do.call(cbind, list.df)

#     a  a  a
# 1   1  1  1
# 2   2  2  2
# 3   3  3  3
# 4   4  4 NA
# 5   5  5 NA
# 6   6 NA NA
# 7   7 NA NA
# 8   8 NA NA
# 9   9 NA NA
# 10 10 NA NA
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文