创建长度不等的数据框

发布于 2024-12-01 14:17:28 字数 473 浏览 1 评论 0原文

虽然数据框列必须具有相同的行数，但有什么方法可以创建长度不等的数据框。我对将它们保存为列表的单独元素不感兴趣，因为我经常必须通过电子邮件将此信息作为 csv 文件发送给人们，而作为数据框最简单。

x = c(rep("one",2))
y = c(rep("two",10))
z = c(rep("three",5))
cbind(x,y,z)

在上面的代码中，cbind() 函数只是回收较短的列，以便它们每列都有 10 个元素。我怎样才能改变它，使长度为 2、10 和 5。

我过去通过执行以下操作来完成此操作，但效率很低。

  df = data.frame(one=c(rep("one",2),rep("",8)), 
           two=c(rep("two",10)), three=c(rep("three",5), rep("",5)))

原文

While data frame columns must have the same number rows, is there any way to create a data frame of unequal lengths. I'm not interested in saving them as separate elements of a list because I often have to to email people this info as a csv file, and this is easiest as a data frame.

x = c(rep("one",2))
y = c(rep("two",10))
z = c(rep("three",5))
cbind(x,y,z)

In the above code, the cbind() function just recycles the shorter columns so that they all have 10 elements in each column. How can I alter it just so that lengths are 2, 10, and 5.

I've done this in the past by doing the following, but it's inefficient.

  df = data.frame(one=c(rep("one",2),rep("",8)), 
           two=c(rep("two",10)), three=c(rep("three",5), rep("",5)))

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

猥︴琐丶欲为 2024-12-08 14:17:29

类似问题：

 coin <- c("Head", "Tail")
toss <- sample(coin, 50, replace=TRUE)

categorize <- function(x,len){
  count_heads <- 0
  count_tails <- 0
  tails <- as.character()
  heads <- as.character()
  for(i in 1:len){
    if(x[i] == "Head"){
      heads <- c(heads,x[i])
      count_heads <- count_heads + 1
    }else {
      tails <- c(tails,x[i])
      count_tails <- count_tails + 1
    }
  }
  if(count_heads > count_tails){
    head <- heads
    tail <- c(tails, rep(NA, (count_heads-count_tails)))
  } else {
    head <- c(heads, rep(NA,(count_tails-count_heads)))
    tail <- tails
  }
  data.frame(cbind("Heads"=head, "Tails"=tail))
}

分类（抛掷，50）

输出：
抛硬币后将出现 31 个正面和 19 个反面。然后尾部的其余部分将用 NA 填充以形成数据框。

Similar problem:

 coin <- c("Head", "Tail")
toss <- sample(coin, 50, replace=TRUE)

categorize <- function(x,len){
  count_heads <- 0
  count_tails <- 0
  tails <- as.character()
  heads <- as.character()
  for(i in 1:len){
    if(x[i] == "Head"){
      heads <- c(heads,x[i])
      count_heads <- count_heads + 1
    }else {
      tails <- c(tails,x[i])
      count_tails <- count_tails + 1
    }
  }
  if(count_heads > count_tails){
    head <- heads
    tail <- c(tails, rep(NA, (count_heads-count_tails)))
  } else {
    head <- c(heads, rep(NA,(count_tails-count_heads)))
    tail <- tails
  }
  data.frame(cbind("Heads"=head, "Tails"=tail))
}

categorize(toss,50)

Output:
After the toss of the coin there will be 31 Head and 19 Tail. Then the rest of the tail will be filled with NA in order to make a data frame.

回复收藏 0 原文

穿透光 2024-12-08 14:17:28

抱歉，这不完全是您所要求的，但我认为可能还有另一种方法来获得您想要的东西。

首先，如果向量的长度不同，那么数据就不是真正的表格，不是吗？将其保存到不同的 CSV 文件怎么样？您还可以尝试允许存储多个对象的 ascii 格式 (json ，XML）。

如果您觉得数据确实是表格形式的，您可以填充 NA：

> x = 1:5
> y = 1:12
> max.len = max(length(x), length(y))
> x = c(x, rep(NA, max.len - length(x)))
> y = c(y, rep(NA, max.len - length(y)))
> x
 [1]  1  2  3  4  5 NA NA NA NA NA NA NA
> y
 [1]  1  2  3  4  5  6  7  8  9 10 11 12

如果您绝对必须制作一个具有不相等列的 data.frame ，您可能会破坏检查，后果自负：

> x = 1:5
> y = 1:12
> df = list(x=x, y=y)
> attributes(df) = list(names = names(df),
    row.names=1:max(length(x), length(y)), class='data.frame')
> df
      x  y
1     1  1
2     2  2
3     3  3
4     4  4
5     5  5
6  <NA>  6
7  <NA>  7
 [ reached getOption("max.print") -- omitted 5 rows ]]
Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
  corrupt data frame: columns will be truncated or padded with NAs

Sorry this isn't exactly what you asked, but I think there may be another way to get what you want.

First, if the vectors are different lengths, the data isn't really tabular, is it? How about just save it to different CSV files? You might also try ascii formats that allow storing multiple objects (json, XML).

If you feel the data really is tabular, you could pad on NAs:

> x = 1:5
> y = 1:12
> max.len = max(length(x), length(y))
> x = c(x, rep(NA, max.len - length(x)))
> y = c(y, rep(NA, max.len - length(y)))
> x
 [1]  1  2  3  4  5 NA NA NA NA NA NA NA
> y
 [1]  1  2  3  4  5  6  7  8  9 10 11 12

If you absolutely must make a data.frame with unequal columns you could subvert the check, at your own peril:

> x = 1:5
> y = 1:12
> df = list(x=x, y=y)
> attributes(df) = list(names = names(df),
    row.names=1:max(length(x), length(y)), class='data.frame')
> df
      x  y
1     1  1
2     2  2
3     3  3
4     4  4
5     5  5
6  <NA>  6
7  <NA>  7
 [ reached getOption("max.print") -- omitted 5 rows ]]
Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
  corrupt data frame: columns will be truncated or padded with NAs

回复收藏 0 原文

刘备忘录 2024-12-08 14:17:28

另一种填充方法：

na.pad <- function(x,len){
    x[1:len]
}

makePaddedDataFrame <- function(l,...){
    maxlen <- max(sapply(l,length))
    data.frame(lapply(l,na.pad,len=maxlen),...)
}

x = c(rep("one",2))
y = c(rep("two",10))
z = c(rep("three",5))

makePaddedDataFrame(list(x=x,y=y,z=z))

na.pad() 函数利用了这样一个事实：如果您尝试索引不存在的元素，R 将自动用 NA 填充向量。

makePaddedDataFrame() 只是找到最长的一个并将其余的填充到匹配的长度。

Another approach to the padding:

na.pad <- function(x,len){
    x[1:len]
}

makePaddedDataFrame <- function(l,...){
    maxlen <- max(sapply(l,length))
    data.frame(lapply(l,na.pad,len=maxlen),...)
}

x = c(rep("one",2))
y = c(rep("two",10))
z = c(rep("three",5))

makePaddedDataFrame(list(x=x,y=y,z=z))

The na.pad() function exploits the fact that R will automatically pad a vector with NAs if you try to index non-existent elements.

makePaddedDataFrame() just finds the longest one and pads the rest up to a matching length.

回复收藏 0 原文

行雁书 2024-12-08 14:17:28

要放大@goodside的答案，你可以这样做

L <- list(x,y,z)
cfun <- function(L) {
  pad.na <- function(x,len) {
   c(x,rep(NA,len-length(x)))
  }
  maxlen <- max(sapply(L,length))
  do.call(data.frame,lapply(L,pad.na,len=maxlen))
}
cfun(L)

To amplify @goodside's answer, you can do something like

L <- list(x,y,z)
cfun <- function(L) {
  pad.na <- function(x,len) {
   c(x,rep(NA,len-length(x)))
  }
  maxlen <- max(sapply(L,length))
  do.call(data.frame,lapply(L,pad.na,len=maxlen))
}
cfun(L)

回复收藏 0 原文

戴着白色围巾的女孩 2024-12-08 14:17:28

您需要的是将 NA 填充到向量的末尾以匹配最长向量的长度，因此您可以执行以下操作：

l <- tibble::lst(x, y, z)
data.frame(lapply(l, `length<-`, max(lengths(l))))

      x   y     z
1   one two three
2   one two three
3  <NA> two three
4  <NA> two three
5  <NA> two three
6  <NA> two  <NA>
7  <NA> two  <NA>
8  <NA> two  <NA>
9  <NA> two  <NA>
10 <NA> two  <NA>

What you need is to pad NAs to the end of the vector to match the length of the longest vector, so you can do:

l <- tibble::lst(x, y, z)
data.frame(lapply(l, `length<-`, max(lengths(l))))

      x   y     z
1   one two three
2   one two three
3  <NA> two three
4  <NA> two three
5  <NA> two three
6  <NA> two  <NA>
7  <NA> two  <NA>
8  <NA> two  <NA>
9  <NA> two  <NA>
10 <NA> two  <NA>

回复收藏 0 原文

随风而去 2024-12-08 14:17:28

我们可以通过用空字符“”填充列来创建包含不等长度列的数据框。以下代码可用于创建长度不等的数据框。

代码首先查找列表对象的最大列长度，l，然后用“”填充列。这将导致列表的每一列具有相同数量的元素。然后将该列表转换为数据框。

# The list column names
cols <- names(l)

# The maximum column length
max_len <- 0
for (col in cols){
    if (length(l[[col]]) > max_len)
        max_len <- length(l[[col]])
}

# Each column is padded
for (col in cols){
    l[[col]] <- c(l[[col]], rep("", max_len - length(l[[col]])))
}

# The list is converted to data frame
df <- as.data.frame(l)

We can create a data frame containing columns of unequal lengths by padding the columns with empty character "". The following code can be used to create a data frame with unequal lengths

The code first finds the maximum column length of a list object, l Next the columns are padded with "". This will cause each column of the list to have the same number of elements. The list is then converted to a data frame.

# The list column names
cols <- names(l)

# The maximum column length
max_len <- 0
for (col in cols){
    if (length(l[[col]]) > max_len)
        max_len <- length(l[[col]])
}

# Each column is padded
for (col in cols){
    l[[col]] <- c(l[[col]], rep("", max_len - length(l[[col]])))
}

# The list is converted to data frame
df <- as.data.frame(l)

回复收藏 0 原文

~没有更多了~