R 合并 data.frames asof join
我有一大堆时间间隔不规则的 data.frames。
我想创建一个新的 data.frame 并将其他 data.frame 加入其中,对于加入的每个 data.frame,从新的 data.frame 中选择最新的值。
例如,下面的 listOfDataFrames 包含一个 data.frames 列表,每个 data.frames 都有一个以秒为单位的时间列。我找到总范围,将范围除以 60,然后对其进行排序,以获得完整分钟的递增 seqn。现在我需要将 data.frames 列表合并到这个新 seqn 的左侧。例如,如果 mypoints 中的值是 60,则连接到它的值应该是最新值 <= 60。
xrange <- range(lapply(listOfDataFrames,function(x) range(x$Time)))
mypoints <- 60*do.call(seq,as.list(xrange%/%60))
我相信这有时称为 asof 连接。
有一个简单的程序可以做到这一点吗?
谢谢
编辑:这是我目前使用的
xrange <- range(lapply(listOfDataFrames,function(x) range(x$Time)))
mypoints <- 60*seq(xrange[1]%/%60,1+xrange[2]%/%60)
result <- data.frame(Time=mypoints)
for(index in 1:length(listOfDataFrames))
{
x<-listOfDataFrames[[index]]
indices <- which(sort(c(mypoints,x$Time)) %in% mypoints) - 1:length(mypoints)
indices[indices==0] <- NA
newdf<-data.frame(new=x$Result[indices])
colnames(newdf)<-paste("S",index,sep="")
result <- cbind(result,newdf)
}
编辑:完整示例
AsOfJoin <- function (listOfDataFrames) {
xrange <- range(lapply(listOfDataFrames,function(x) range(x$Time)))
mypoints <- 60*seq(xrange[1]%/%60,1+xrange[2]%/%60)
result <- data.frame(Time=mypoints)
for(index in 1:length(listOfDataFrames))
{
x<-listOfDataFrames[[index]]
indices <- which(sort(c(mypoints,x$Time)) %in% mypoints) - 1:length(mypoints)
indices[indices==0] <- NA
newdf<-data.frame(new=x$Result[indices])
colnames(newdf)<-paste("S",index,sep="")
result <- cbind(result,newdf)
}
result[is.na(result)]<-0
result
}
a<-data.frame(Time=c(28947.5,28949.6,29000),Result=c(10,15,9))
b<-data.frame(Time=c(28947.8,28949.5),Result=c(14,19))
listOfDataFrames <- list(a,b)
result<-AsOfJoin(listOfDataFrames)
> a
Time Result
1 28947.5 10
2 28949.6 15
3 29000.0 9
> b
Time Result
1 28947.8 14
2 28949.5 19
> result
Time S1 S2
1 28920 0 0
2 28980 15 19
3 29040 9 19
I have a whole bunch of data.frames with irregular time spacing.
I would like to make a new data.frame and join the others to it, for each data.frame being joined picking the latest value out of the new data.frame.
For example, listOfDataFrames below contains a list of data.frames each of which has a time column in seconds. I find the total range, mod the range by 60 and seqn it by to obtain an increasing seqn of full minutes. Now I need to merge the list of data.frames to the left of this new seqn. e.g. if the value in mypoints is 60, the value joined to it should be the latest value <= 60.
xrange <- range(lapply(listOfDataFrames,function(x) range(x$Time)))
mypoints <- 60*do.call(seq,as.list(xrange%/%60))
I believe this is sometimes called an asof join.
Is there a simple procedure to do this?
Thanks
EDIT: this is what I currently use
xrange <- range(lapply(listOfDataFrames,function(x) range(x$Time)))
mypoints <- 60*seq(xrange[1]%/%60,1+xrange[2]%/%60)
result <- data.frame(Time=mypoints)
for(index in 1:length(listOfDataFrames))
{
x<-listOfDataFrames[[index]]
indices <- which(sort(c(mypoints,x$Time)) %in% mypoints) - 1:length(mypoints)
indices[indices==0] <- NA
newdf<-data.frame(new=x$Result[indices])
colnames(newdf)<-paste("S",index,sep="")
result <- cbind(result,newdf)
}
EDIT: full example
AsOfJoin <- function (listOfDataFrames) {
xrange <- range(lapply(listOfDataFrames,function(x) range(x$Time)))
mypoints <- 60*seq(xrange[1]%/%60,1+xrange[2]%/%60)
result <- data.frame(Time=mypoints)
for(index in 1:length(listOfDataFrames))
{
x<-listOfDataFrames[[index]]
indices <- which(sort(c(mypoints,x$Time)) %in% mypoints) - 1:length(mypoints)
indices[indices==0] <- NA
newdf<-data.frame(new=x$Result[indices])
colnames(newdf)<-paste("S",index,sep="")
result <- cbind(result,newdf)
}
result[is.na(result)]<-0
result
}
a<-data.frame(Time=c(28947.5,28949.6,29000),Result=c(10,15,9))
b<-data.frame(Time=c(28947.8,28949.5),Result=c(14,19))
listOfDataFrames <- list(a,b)
result<-AsOfJoin(listOfDataFrames)
> a
Time Result
1 28947.5 10
2 28949.6 15
3 29000.0 9
> b
Time Result
1 28947.8 14
2 28949.5 19
> result
Time S1 S2
1 28920 0 0
2 28980 15 19
3 29040 9 19
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
data.table 提供非常快的
asof
开箱即用。另请参阅
data.table provide very fast
asof
joins out of the box.See also This post for an example
请参阅我的编辑以获得答案。显然是最好的方法。
See my edit for answer. Apparently the best way.