R:向空数据框添加行时丢失列名称

发布于 2024-10-20 14:17:01 字数 614 浏览 6 评论 0原文

我刚刚开始使用 R 并遇到一个奇怪的行为:在空数据框中插入第一行时,原始列名称丢失。

示例:

a<-data.frame(one = numeric(0), two = numeric(0))
a
#[1] one two
#<0 rows> (or 0-length row.names)
names(a)
#[1] "one" "two"
a<-rbind(a, c(5,6))
a
#  X5 X6
#1  5  6
names(a)
#[1] "X5" "X6"

如您所见,列名称 onetwo 被替换为 X5X6

有人可以告诉我为什么会发生这种情况吗?有没有正确的方法可以在不丢失列名的情况下做到这一点?

霰弹枪解决方案是将名称保存在辅助向量中,然后在完成数据框处理后将它们添加回来。

谢谢

上下文:

我创建了一个函数,它收集一些数据并将它们作为新行添加到作为参数接收的数据帧中。 我创建数据框,迭代数据源,将 data.frame 传递给每个函数调用以填充其结果。

I am just starting with R and encountered a strange behaviour: when inserting the first row in an empty data frame, the original column names get lost.

example:

a<-data.frame(one = numeric(0), two = numeric(0))
a
#[1] one two
#<0 rows> (or 0-length row.names)
names(a)
#[1] "one" "two"
a<-rbind(a, c(5,6))
a
#  X5 X6
#1  5  6
names(a)
#[1] "X5" "X6"

As you can see, the column names one and two were replaced by X5 and X6.

Could somebody please tell me why this happens and is there a right way to do this without losing column names?

A shotgun solution would be to save the names in an auxiliary vector and then add them back when finished working on the data frame.

Thanks

Context:

I created a function which gathers some data and adds them as a new row to a data frame received as a parameter.
I create the data frame, iterate through my data sources, passing the data.frame to each function call to be filled up with its results.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

Saygoodbye 2024-10-27 14:17:02

rbind 帮助页面指定:

对于“cbind”(“rbind”),零向量
长度(包括“NULL”)被忽略
除非结果为零行
(列),用于 S 兼容性。
(零范围矩阵不会出现在
S3 和 在 R 中不会被忽略。)

因此,事实上,a 在您的 rbind 指令中被忽略。似乎并没有完全被忽略,因为它是一个数据帧,所以 rbind 函数被称为 rbind.data.frame :

rbind.data.frame(c(5,6))
#  X5 X6
#1  5  6

也许插入行的一种方法是:

a[nrow(a)+1,] <- c(5,6)
a
#  one two
#1   5   6

但根据您的代码,可能有更好的方法。

The rbind help pages specifies that :

For ‘cbind’ (‘rbind’), vectors of zero
length (including ‘NULL’) are ignored
unless the result would have zero rows
(columns), for S compatibility.
(Zero-extent matrices do not occur in
S3 and are not ignored in R.)

So, in fact, a is ignored in your rbind instruction. Not totally ignored, it seems, because as it is a data frame the rbind function is called as rbind.data.frame :

rbind.data.frame(c(5,6))
#  X5 X6
#1  5  6

Maybe one way to insert the row could be :

a[nrow(a)+1,] <- c(5,6)
a
#  one two
#1   5   6

But there may be a better way to do it depending on your code.

半﹌身腐败 2024-10-27 14:17:02

几乎要屈服于这个问题。

1) 创建数据框,并将 stringsAsFactor 设置为 FALSE,否则您将直接进入下一期

2) 不要使用 rbind - 不知道为什么实际上它弄乱了列名。只需这样做:

df[nrow(df)+1,] <- c("d","gsgsgd",4)

df <- data.frame(a = character(0), b=character(0), c=numeric(0))

df[nrow(df)+1,] <- c("d","gsgsgd",4)

#Warnmeldungen:
#1: In `[<-.factor`(`*tmp*`, iseq, value = "d") :
#  invalid factor level, NAs generated
#2: In `[<-.factor`(`*tmp*`, iseq, value = "gsgsgd") :
#  invalid factor level, NAs generated

df <- data.frame(a = character(0), b=character(0), c=numeric(0), stringsAsFactors=F)

df[nrow(df)+1,] <- c("d","gsgsgd",4)

df
#  a      b c
#1 d gsgsgd 4

was almost surrendering to this issue.

1) create data frame with stringsAsFactor set to FALSE or you run straight into the next issue

2) don't use rbind - no idea why on earth it is messing up the column names. simply do it this way:

df[nrow(df)+1,] <- c("d","gsgsgd",4)

df <- data.frame(a = character(0), b=character(0), c=numeric(0))

df[nrow(df)+1,] <- c("d","gsgsgd",4)

#Warnmeldungen:
#1: In `[<-.factor`(`*tmp*`, iseq, value = "d") :
#  invalid factor level, NAs generated
#2: In `[<-.factor`(`*tmp*`, iseq, value = "gsgsgd") :
#  invalid factor level, NAs generated

df <- data.frame(a = character(0), b=character(0), c=numeric(0), stringsAsFactors=F)

df[nrow(df)+1,] <- c("d","gsgsgd",4)

df
#  a      b c
#1 d gsgsgd 4
绿光 2024-10-27 14:17:02

解决方法是:

a <- rbind(a, data.frame(one = 5, two = 6))

?rbind 指出合并对象需要匹配的名称:

然后它采用的类
第一个数据帧中的列,以及
按名称匹配列(而不是
按位置)

Workaround would be:

a <- rbind(a, data.frame(one = 5, two = 6))

?rbind states that merging objects demands matching names:

It then takes the classes of the
columns from the first data frame, and
matches columns by name (rather than
by position)

泪眸﹌ 2024-10-27 14:17:02

FWIW,另一种设计可能让您的函数为两列构建向量,而不是绑定到数据框:

ones <- c()
twos <- c()

修改函数中的向量:

ones <- append(ones, 5)
twos <- append(twos, 6)

根据需要重复,然后一次性创建 data.frame:

a <- data.frame(one=ones, two=twos)

FWIW, an alternative design might have your functions building vectors for the two columns, instead of rbinding to a data frame:

ones <- c()
twos <- c()

Modify the vectors in your functions:

ones <- append(ones, 5)
twos <- append(twos, 6)

Repeat as needed, then create your data.frame in one go:

a <- data.frame(one=ones, two=twos)
友欢 2024-10-27 14:17:02

使此工作通用且最少重新键入列名称的一种方法如下。此方法不需要破解 NA 或 0。

rs <- data.frame(i=numeric(), square=numeric(), cube=numeric())
for (i in 1:4) {
    calc <- c(i, i^2, i^3)
    # append calc to rs
    names(calc) <- names(rs)
    rs <- rbind(rs, as.list(calc))
}

rs 将具有正确的名称

> rs
    i square cube
1   1      1    1
2   2      4    8
3   3      9   27
4   4     16   64
> 

另一种更干净地执行此操作的方法是使用 data.table:

> df <- data.frame(a=numeric(0), b=numeric(0))
> rbind(df, list(1,2)) # column names are messed up
>   X1 X2
> 1  1  2

> df <- data.table(a=numeric(0), b=numeric(0))
> rbind(df, list(1,2)) # column names are preserved
   a b
1: 1 2

请注意,data.table 也是 data.frame。

> class(df)
"data.table" "data.frame"

One way to make this work generically and with the least amount of re-typing the column names is the following. This method doesn't require hacking the NA or 0.

rs <- data.frame(i=numeric(), square=numeric(), cube=numeric())
for (i in 1:4) {
    calc <- c(i, i^2, i^3)
    # append calc to rs
    names(calc) <- names(rs)
    rs <- rbind(rs, as.list(calc))
}

rs will have the correct names

> rs
    i square cube
1   1      1    1
2   2      4    8
3   3      9   27
4   4     16   64
> 

Another way to do this more cleanly is to use data.table:

> df <- data.frame(a=numeric(0), b=numeric(0))
> rbind(df, list(1,2)) # column names are messed up
>   X1 X2
> 1  1  2

> df <- data.table(a=numeric(0), b=numeric(0))
> rbind(df, list(1,2)) # column names are preserved
   a b
1: 1 2

Notice that a data.table is also a data.frame.

> class(df)
"data.table" "data.frame"
想你的星星会说话 2024-10-27 14:17:02

你可以这样做:

给初始数据框添加一行,

 df=data.frame(matrix(nrow=1,ncol=length(newrow))

添加新行并取出 NAS,

newdf=na.omit(rbind(newrow,df))

但要注意你的新行没有 NA,否则它也会被删除。

干杯
阿古斯

You can do this:

give one row to the initial data frame

 df=data.frame(matrix(nrow=1,ncol=length(newrow))

add your new row and take out the NAS

newdf=na.omit(rbind(newrow,df))

but watch out that your newrow does not have NAs or it will be erased too.

Cheers
Agus

甲如呢乙后呢 2024-10-27 14:17:02

我使用以下解决方案向空数据框添加一行:

d_dataset <- 
  data.frame(
    variable = character(),
    before = numeric(),
    after = numeric(),
    stringsAsFactors = FALSE)

d_dataset <- 
  rbind(
    d_dataset,
      data.frame(
        variable = "test",
        before = 9,
        after = 12,
        stringsAsFactors = FALSE))  

print(d_dataset)

variable before after  
1     test      9    12

HTH。

亲切的问候

格奥尔格

I use the following solution to add a row to an empty data frame:

d_dataset <- 
  data.frame(
    variable = character(),
    before = numeric(),
    after = numeric(),
    stringsAsFactors = FALSE)

d_dataset <- 
  rbind(
    d_dataset,
      data.frame(
        variable = "test",
        before = 9,
        after = 12,
        stringsAsFactors = FALSE))  

print(d_dataset)

variable before after  
1     test      9    12

HTH.

Kind regards

Georg

你的往事 2024-10-27 14:17:02

我没有使用 numeric(0) 构建 data.frame,而是使用 as.numeric(0)

a<-data.frame(one=as.numeric(0), two=as.numeric(0))

这会创建一个额外的初始行

a
#    one two
#1   0   0

绑定附加行

a<-rbind(a,c(5,6))
a
#    one two
#1   0   0
#2   5   6

然后使用负索引删除第一行(虚假)

a<-a[-1,]
a

#    one two
#2   5   6

注意:它会弄乱索引(最左边)。我还没弄清楚如何防止这种情况(还有其他人吗?),但大多数时候这可能并不重要。

Instead of constructing the data.frame with numeric(0) I use as.numeric(0).

a<-data.frame(one=as.numeric(0), two=as.numeric(0))

This creates an extra initial row

a
#    one two
#1   0   0

Bind the additional rows

a<-rbind(a,c(5,6))
a
#    one two
#1   0   0
#2   5   6

Then use negative indexing to remove the first (bogus) row

a<-a[-1,]
a

#    one two
#2   5   6

Note: it messes up the index (far left). I haven't figured out how to prevent that (anyone else?), but most of the time it probably doesn't matter.

记忆消瘦 2024-10-27 14:17:02

对这一令人尊敬的烦恼的研究使我来到了这一页。我想为 Georg 的出色答案添加更多解释(https://stackoverflow.com/a/41609844/2757825 ),这不仅解决了OP提出的问题(丢失字段名称),而且还防止了所有字段不必要的转换为因子。对我来说,这两个问题是相辅相成的。我想要一个基于 R 的解决方案,它不涉及编写额外的代码,但保留两个不同的操作:定义数据框、追加行——这就是 Georg 的答案提供的。

下面的前两个示例说明了问题,第三个和第四个示例显示了 Georg 的解决方案。

示例 1:使用 rbind 将新行作为向量追加

  • 结果:丢失列名称并将所有变量转换为因子
my.df <- data.frame(
    table = character(0),
    score = numeric(0),
    stringsAsFactors=FALSE
    )
my.df <- rbind(
    my.df, 
    c("Bob", 250) 
    )
    
my.df
  X.Bob. X.250.
1    Bob    250

str(my.df)
'data.frame':   1 obs. of  2 variables:
 $ X.Bob.: Factor w/ 1 level "Bob": 1
 $ X.250.: Factor w/ 1 level "250": 1

示例 2:将新行作为 rbind 内的数据框追加

  • 结果:保留列名称,但仍将字符变量转换为因子。
my.df <- data.frame(
    table = character(0),
    score = numeric(0),
    stringsAsFactors=FALSE
    )
my.df <- rbind(
    my.df, 
    data.frame(name="Bob", score=250) 
    )
    
my.df
      name score
1 Bob  250

str(my.df)
'data.frame':   1 obs. of  2 variables:
 $ name : Factor w/ 1 level "Bob": 1
 $ score: num 250

示例 3:将新行作为数据框附加到 rbind 内,并设置 stringsAsFactors=FALSE

  • 结果:问题已解决。
my.df <- data.frame(
    table = character(0),
    score = numeric(0),
    stringsAsFactors=FALSE
    )
my.df <- rbind(
    my.df, 
    data.frame(name="Bob", score=250, stringsAsFactors=FALSE) 
    )
    
my.df
      name score
1 Bob  250

str(my.df)
'data.frame':   1 obs. of  2 variables:
 $ name : chr "Bob"
 $ score: num 250

示例 4:与示例 3 类似,但一次添加多行。

my.df <- data.frame(
    table = character(0),
    score = numeric(0),
    stringsAsFactors=FALSE
    )
my.df <- rbind(
    my.df, 
    data.frame(
        name=c("Bob", "Carol", "Ted"), 
        score=c(250, 124, 95), 
        stringsAsFactors=FALSE) 
    )

str(my.df)
'data.frame':   3 obs. of  2 variables:
 $ name : chr  "Bob" "Carol" "Ted"
 $ score: num  250 124 95

my.df
   name score
1   Bob   250
2 Carol   124
3   Ted    95

Researching this venerable R annoyance brought me to this page. I wanted to add a bit more explanation to Georg's excellent answer (https://stackoverflow.com/a/41609844/2757825), which not only solves the problem raised by the OP (losing field names) but also prevents the unwanted conversion of all fields to factors. For me, those two problems go together. I wanted a solution in base R that doesn't involve writing extra code but preserves the two distinct operations: define the data frame, append the row(s)--which is what Georg's answer provides.

The first two examples below illustrate the problems and the third and fourth show Georg's solution.

Example 1: Append the new row as vector with rbind

  • Result: loses column names AND coverts all variables to factors
my.df <- data.frame(
    table = character(0),
    score = numeric(0),
    stringsAsFactors=FALSE
    )
my.df <- rbind(
    my.df, 
    c("Bob", 250) 
    )
    
my.df
  X.Bob. X.250.
1    Bob    250

str(my.df)
'data.frame':   1 obs. of  2 variables:
 $ X.Bob.: Factor w/ 1 level "Bob": 1
 $ X.250.: Factor w/ 1 level "250": 1

Example 2: Append the new row as a data frame inside rbind

  • Result: keeps column names but still converts character variables to factors.
my.df <- data.frame(
    table = character(0),
    score = numeric(0),
    stringsAsFactors=FALSE
    )
my.df <- rbind(
    my.df, 
    data.frame(name="Bob", score=250) 
    )
    
my.df
      name score
1 Bob  250

str(my.df)
'data.frame':   1 obs. of  2 variables:
 $ name : Factor w/ 1 level "Bob": 1
 $ score: num 250

Example 3: Append the new row inside rbind as a data frame, with stringsAsFactors=FALSE

  • Result: problem solved.
my.df <- data.frame(
    table = character(0),
    score = numeric(0),
    stringsAsFactors=FALSE
    )
my.df <- rbind(
    my.df, 
    data.frame(name="Bob", score=250, stringsAsFactors=FALSE) 
    )
    
my.df
      name score
1 Bob  250

str(my.df)
'data.frame':   1 obs. of  2 variables:
 $ name : chr "Bob"
 $ score: num 250

Example 4: Like example 3, but adding multiple rows at once.

my.df <- data.frame(
    table = character(0),
    score = numeric(0),
    stringsAsFactors=FALSE
    )
my.df <- rbind(
    my.df, 
    data.frame(
        name=c("Bob", "Carol", "Ted"), 
        score=c(250, 124, 95), 
        stringsAsFactors=FALSE) 
    )

str(my.df)
'data.frame':   3 obs. of  2 variables:
 $ name : chr  "Bob" "Carol" "Ted"
 $ score: num  250 124 95

my.df
   name score
1   Bob   250
2 Carol   124
3   Ted    95

涫野音 2024-10-27 14:17:02

您可以使用 tibble 包中的 add_row

tibble::add_row(a, one = c(5, 10), two = c(6, 8))

输出

  one two
1   5   6
2  10   8

You can use add_row from the tibble package:

tibble::add_row(a, one = c(5, 10), two = c(6, 8))

Output

  one two
1   5   6
2  10   8
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文