R - rbind 问题 ->具有相同名称和格式的重复列

发布于 2025-01-16 11:25:42 字数 626 浏览 4 评论 0原文

最近,我在 RStudio 中使用 rbind 将数据表添加到另一个表时遇到了一个问题。假设两个数据表有两列名称和格式完全相同(我已使用 str() 检查过这一点)。 但是当我想绑定它们时(代码 table3<-rbind(table1, table2,fill=T)),它会复制这些列,以便在生成的 table3 中有两列具有完全相同的名称(第一个列的条目为来自 table1 的所有行和 table2 的所有行的第二行)或者它的列名只有一次,但来自 table2 的所有行条目都是 NA。 两者都非常烦人,而且也是一个新问题,因为我之前使用了完全相同的代码并且它运行得很好。我使用的RVersion是R.4.1.1。我是否忽略了什么?或者这个版本可能存在一些错误?

非常感谢您的帮助。

表 1 如下所示:1。 表 2 如下所示:2

结构和错误看起来像

recently I was confronted with a problem in RStudio when using rbind to add a data table to another. Assume the two data tables have two columns with exactly the same name and format (I have checked this with str()).
But when I want to bind them (Code table3<-rbind(table1, table2,fill=T)) either it duplicates the columns so that in the resulting table3 there a two columns with exactly the same name (the first one has entries for all rows coming from table1 and the second one for all rows of table2) or it has the name of the column only once but all the entries of rows coming from table2 are NA.
Both is very annoying, and also a new problem, because I used exactly the same code earlier and it was working perfectly well. The RVersion I'm using is R.4.1.1. Am I overlooking something? Or might there be some bug in this version?

Thanks very much for your help.

Table1 looks like:1.
Table2 looks like:2.

Structure and error looks like

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

甜点 2025-01-23 11:25:42

尝试使用 merge(),例如:

table3 <- merge(table1,table2, by.x = 'names of table1', by.y = 'names of table2')

try to use merge(), e.g.:

table3 <- merge(table1,table2, by.x = 'names of table1', by.y = 'names of table2')
不知在何时 2025-01-23 11:25:42

检查以确保两者的编码相同。

可重现的例子。

对列名称进行编码

quux1 <- quux2 <- data.table(Rechtsträger="a")
rbind(quux1, quux2)
#    Rechtsträger
#          <char>
# 1:            a
# 2:            a
Encoding(names(quux1))
# [1] "unknown"
Encoding(names(quux1)) <- "latin1"
rbind(quux1, quux2)
# Error in rbindlist(l, use.names, fill, idcol) : 
#   Column 1 ['Rechtsträger'] of item 2 is missing in item 1. Use fill=TRUE to fill with NA (NULL for list columns), or use.names=FALSE to ignore column names.

一个修复?复制编码:

Encoding(names(quux2)) <- Encoding(names(quux1))
rbind(quux1, quux2)
#    Rechtsträger
#          <char>
# 1:            a
# 2:            a

对内容进行编码

我认为这不是问题,因为它会产生不同的(尽管相似)错误。

vec <- "Rechtsträger"
quux1 <- data.table(Rechtsträger=vec)
Encoding(vec)
# [1] "latin1"
Encoding(vec) <- "UTF-8"
Encoding(vec)
# [1] "UTF-8"
quux2 <- data.table(Rechtsträger=vec)
rbind(quux1, quux2)
# Error in nchar(x) : invalid multibyte string, element 2

类似的修复:

Encoding(quux2[[1]]) <- Encoding(quux1[[1]])
rbind(quux1, quux2)
#    Rechtsträger
#          <char>
# 1: Rechtsträger
# 2: Rechtsträger

请注意未来的自己:我假设 dput 的输出完全明确,但发现了它的缺陷:

quux1 <- quux2 <- data.table(Rechtsträger="a")
Encoding(names(quux1)) <- "latin1"
dput(quux1)
# structure(list(Rechtsträger = "a"), row.names = c(NA, -1L), class = c("data.table", 
# "data.frame"), .internal.selfref = <pointer: 0x0000000004501ef0>)
dput(quux2)
# structure(list(Rechtsträger = "a"), row.names = c(NA, -1L), class = c("data.table", 
# "data.frame"), .internal.selfref = <pointer: 0x0000000004501ef0>)

identical(deparse1(quux1), deparse1(quux2))
# [1] TRUE
identical(Encoding(names(quux1)), Encoding(names(quux2)))
# [1] FALSE

Check to ensure that the encoding is the same on both.

Reproducible examples.

Encoding on the column names

quux1 <- quux2 <- data.table(Rechtsträger="a")
rbind(quux1, quux2)
#    Rechtsträger
#          <char>
# 1:            a
# 2:            a
Encoding(names(quux1))
# [1] "unknown"
Encoding(names(quux1)) <- "latin1"
rbind(quux1, quux2)
# Error in rbindlist(l, use.names, fill, idcol) : 
#   Column 1 ['Rechtsträger'] of item 2 is missing in item 1. Use fill=TRUE to fill with NA (NULL for list columns), or use.names=FALSE to ignore column names.

One fix? Copy the encoding:

Encoding(names(quux2)) <- Encoding(names(quux1))
rbind(quux1, quux2)
#    Rechtsträger
#          <char>
# 1:            a
# 2:            a

Encoding on the contents

I don't think this is the problem, as it is producing a different (though similar) error.

vec <- "Rechtsträger"
quux1 <- data.table(Rechtsträger=vec)
Encoding(vec)
# [1] "latin1"
Encoding(vec) <- "UTF-8"
Encoding(vec)
# [1] "UTF-8"
quux2 <- data.table(Rechtsträger=vec)
rbind(quux1, quux2)
# Error in nchar(x) : invalid multibyte string, element 2

Similar fix:

Encoding(quux2[[1]]) <- Encoding(quux1[[1]])
rbind(quux1, quux2)
#    Rechtsträger
#          <char>
# 1: Rechtsträger
# 2: Rechtsträger

Note to future self: my assumption that the output from dput is perfectly unambiguous has found its flaw:

quux1 <- quux2 <- data.table(Rechtsträger="a")
Encoding(names(quux1)) <- "latin1"
dput(quux1)
# structure(list(Rechtsträger = "a"), row.names = c(NA, -1L), class = c("data.table", 
# "data.frame"), .internal.selfref = <pointer: 0x0000000004501ef0>)
dput(quux2)
# structure(list(Rechtsträger = "a"), row.names = c(NA, -1L), class = c("data.table", 
# "data.frame"), .internal.selfref = <pointer: 0x0000000004501ef0>)

identical(deparse1(quux1), deparse1(quux2))
# [1] TRUE
identical(Encoding(names(quux1)), Encoding(names(quux2)))
# [1] FALSE
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文