R循环:将列添加到表中(如果尚不存在)
我正在尝试使用 R 中的 for 循环编译多个文件中的数据。我想将所有数据放入一张表中。下面的计算只是一个例子。
library(reshape)
dat1 <- data.frame("Specimen" = paste("sp", 1:10, sep=""), "Density_1" = rnorm(10,4,2), "Density_2" = rnorm(10,4,2), "Density_3" = rnorm(10,4,2))
dat2 <- data.frame("Specimen" = paste("fg", 1:10, sep=""), "Density_1" = rnorm(10,4,2), "Density_2" = rnorm(10,4,2))
dat <- c("dat1", "dat2")
for(i in 1:length(dat)){
data <- get(dat[i])
melt.data <- melt(data, id = 1)
assign(paste(dat[i], "tbl", sep=""), cast(melt.data, ~ variable, mean))
}
rbind(dat1tbl, dat2tbl)
在 dat2 中添加额外列的最流畅方法是什么?我想获得相同的列名称(在本例中为“Density_3”)并用零填充(如果它尚不存在)。假设我有大约 100 个表,列数(Density_1、2、3 等)在 5 到 6 之间变化。
我尝试了以下操作,但没有成功:
if(names(data) %in% "Density_3" == FALSE){
dat.all$Density_3 <- 0
} else {
dat.all$Density_3 <- dat.all$Density3}
另一个:是否有一种平滑的方法来 rbind() 表?看来 rbind(get(dat)) 不起作用。
I am trying to compile data from several files using for loops in R. I would like to get all the data into one table. Following calculation is just an example.
library(reshape)
dat1 <- data.frame("Specimen" = paste("sp", 1:10, sep=""), "Density_1" = rnorm(10,4,2), "Density_2" = rnorm(10,4,2), "Density_3" = rnorm(10,4,2))
dat2 <- data.frame("Specimen" = paste("fg", 1:10, sep=""), "Density_1" = rnorm(10,4,2), "Density_2" = rnorm(10,4,2))
dat <- c("dat1", "dat2")
for(i in 1:length(dat)){
data <- get(dat[i])
melt.data <- melt(data, id = 1)
assign(paste(dat[i], "tbl", sep=""), cast(melt.data, ~ variable, mean))
}
rbind(dat1tbl, dat2tbl)
What is the smoothest way to add an extra column into dat2? I would like to get the same column name ("Density_3" in this case) and fill it up with zeros, if it does not already exist. Assume that I have ~100 tables with number of columns (Density_1, 2, 3 etc) varying between 5 and 6.
I tried following, but it didn't work:
if(names(data) %in% "Density_3" == FALSE){
dat.all$Density_3 <- 0
} else {
dat.all$Density_3 <- dat.all$Density3}
Another one: is there a smooth way to rbind() the tables? It seems that rbind(get(dat)) does not work.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
盯着这个问题一段时间后,我认为它的意图可能被不必要的
get
和assign
操作所掩盖。我认为答案是pylr::rbind.fill
我会构造“dat”,不是作为字符向量,而是作为两个数据帧的列表,使用
aggregate( ..., FUN=mean)
(因为我还没有登上 reshape2/plyr 总线,除了melt
和rbind.fill
)然后do.call(rbind.fill, ...)
在结果列表上。无论如何,这就是我认为你想要的。我认为为真正缺失的值添加零不是一个好主意。After staring at this question for a while I think its intent may have been obscured by the unnecessary
get
andassign
manipulations. And I think the answer ispylr::rbind.fill
I would have constructed "dat", not as a character vector but as a list of two dataframes, used
aggregate( ..., FUN=mean)
(because I haven't gotten on the reshape2/plyr bus, except formelt
andrbind.fill
that is ) and thendo.call(rbind.fill, ...)
on the resulting list. At any rate this is what I think you want. I do not think it is a good idea to add in zeros for what are really missing values.这是一篇旧帖子,但无论如何:我相信如果您切换顺序,您上面提到的代码将会起作用:
正如您所拥有的,这部分
“Density_3”%in%names(data)== FALSE
会给你一个 TRUE/FALSE 向量(对于每一列),而你想要的只是该特定列的一个值。因此,您需要询问该列是否存在于数据框中,而不是相反。This is an old post, but in any case: I believe the code you mention above would have worked if you switch the order:
As you have it, this part
"Density_3" %in% names(data) == FALSE
would give you a vector of TRUE/FALSE (for each column), while what you want is only one value, for that specific column. So, you need to ask if that column is present in the data frame, and not the opposite.