转换带时间戳的行数据时的性能问题
我编写了一个函数,它采用 data.frame 来表示 1 分钟时间范围内发生的数据间隔。该函数的目的是将这些 1 分钟间隔转换为更高的间隔。例如,1 分钟变成 5 分钟、60 分钟等...数据集本身可能存在数据间隙,即时间跳跃,因此它必须适应这些不良数据的出现。我编写了以下代码,该代码似乎可以工作,但在大型数据集上性能绝对很糟糕。
我希望有人可以就如何加快速度提供一些建议。见下文。
compressMinute = function(interval, DAT) {
#Grab all data which begins at the same interval length
retSet = NULL
intervalFilter = which(DAT$time$min %% interval == 0)
barSet = NULL
for (x in intervalFilter) {
barEndTime = DAT$time[x] + 60*interval
barIntervals = DAT[x,]
x = x+1
while(x <= nrow(DAT) & DAT[x,"time"] < barEndTime) {
barIntervals = rbind(barIntervals,DAT[x,])
x = x + 1
}
bar = data.frame(date=barIntervals[1,"date"],time=barIntervals[1,"time"],open=barIntervals[1,"open"],high=max(barIntervals[1:nrow(barIntervals),"high"]),
low=min(barIntervals[1:nrow(barIntervals),"low"]),close=tail(barIntervals,1)$close,volume=sum(barIntervals[1:nrow(barIntervals),"volume"]))
if (is.null(barSet)) {
barSet = bar
} else {
barSet = rbind(barSet, bar)
}
}
return(barSet)
}
编辑:
下面是我的一行数据。每行代表一个 1 分钟间隔,我试图将其转换为任意存储桶,这些存储桶是这些 1 分钟间隔的聚合,即 5 分钟、15 分钟、60 分钟、240 分钟等......
date time open high low close volume
2005-09-06 2005-09-06 16:33:00 1297.25 1297.50 1297.25 1297.25 98
I've written a function that takes a data.frame which represent intervals of data which occur across a 1 minute timeframe. The purpose of the function is to take these 1 minute intervals and convert them into higher intervals. Example, 1 minute becomes 5 minute, 60 minute etc...The data set itself has the potential to have gaps in the data i.e. jumps in time so it must accommodate for these bad data occurrences. I've written the following code which appears to work but the performance is absolutely terrible on large data sets.
I'm hoping that someone could provide some suggestions on how I might be able to speed this up. See below.
compressMinute = function(interval, DAT) {
#Grab all data which begins at the same interval length
retSet = NULL
intervalFilter = which(DAT$time$min %% interval == 0)
barSet = NULL
for (x in intervalFilter) {
barEndTime = DAT$time[x] + 60*interval
barIntervals = DAT[x,]
x = x+1
while(x <= nrow(DAT) & DAT[x,"time"] < barEndTime) {
barIntervals = rbind(barIntervals,DAT[x,])
x = x + 1
}
bar = data.frame(date=barIntervals[1,"date"],time=barIntervals[1,"time"],open=barIntervals[1,"open"],high=max(barIntervals[1:nrow(barIntervals),"high"]),
low=min(barIntervals[1:nrow(barIntervals),"low"]),close=tail(barIntervals,1)$close,volume=sum(barIntervals[1:nrow(barIntervals),"volume"]))
if (is.null(barSet)) {
barSet = bar
} else {
barSet = rbind(barSet, bar)
}
}
return(barSet)
}
EDIT:
Below is a row of my data. Each row represents a 1 minute interval, I am trying to convert this into arbitrary buckets which are the aggregates of these 1 minute intervals, i.e. 5 minutes, 15 minutes, 60 minutes, 240 minutes, etc...
date time open high low close volume
2005-09-06 2005-09-06 16:33:00 1297.25 1297.50 1297.25 1297.25 98
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可能希望重用现有设施,特别是 POSIXct 时间类型以及现有包。
例如,查看 xts 包 --- 它已经有一个通用函数
to.period()
以及方便的包装器to.months()
、to.months3()
、to.months10()< /code>, ....
这是来自帮助页面:
You probably want to re-use existing facitlities, specifically the
POSIXct
time types, as well as existing packages.For example, look at the xts package --- it already has a generic function
to.period()
as well as convenience wrappersto.minutes()
,to.minutes3()
,to.minutes10()
, ....Here is an example from the help page: