R时间序列,复杂序列
我试图在 R 中合并具有以下特征的两个不同时间序列:
- 数据必须在每天的 08:30 到 15:00 之间。
- 数据跨越几周,而不仅仅是某一天。
- 数据中随机间隔存在间隙。
- 这两个数据集不一定有相同间隔的间隙
我想合并这两个数据集,所有时间都按从 08:30 到 15:00 的顺序排列,并且每个数据集都有间隙,我想要以前的值(或以下值)结转。
# I have verified that the csv files are imported correctly
# The first column contains dates. and the strptime
# function can convert strings into Date/Time objects.
#
sec1_dates <- strptime(sec1[,1], "%m/%d/%Y %H:%M:%S")
sec2_dates <- strptime(sec2[,1], "%m/%d/%Y %H:%M:%S")
# The second column contains the close.
# I use the zoo function to create zoo objects from that data.
# But for some reason this ends up creating duplicates PROBLEM 1
#
a <- zoo(sec1[,2], sec1_dates)
b <- zoo(sec2[,2], sec2_dates)
# I know that I need use seq to fill in gaps but I am clueless as to how
# Once I have the proper seq I can just use na.locf to fill the appropriate values
# HOWEVER seq(start(sec1_dates), end(sec1_dates), "min") would end up returning
# every minute for each day, and I only want 08:30 to 15:30. PROBLEM 2
# The merge function can combine two zoo objects, in union
# Obviously this fails because the two index sizes don't match PROBLEM 3
#
t.zoo <- merge(a, b, all=TRUE)
詹姆斯,您对问题 1 的看法是正确的。谢谢您。我验证了 csv 文件两次提取数据并删除数据解决了问题。我也使用了你的解决方案来解决问题 2,但我不确定这是做我想做的事情的最有效的方法。最终我可能想用它来运行回归,此时可能需要某种循环来提取任意数量的数据集。我可能做出的任何优化将不胜感激。
更新的解决方案
library(zoo)
library(tseries)
# Read the CSV files into data frames
sec1 <- read.csv("C:\\exportdata\\sec1.csv", stringsAsFactors=F, header=F)
sec2 <- read.csv("C:\\exportdata\\sec2.csv", stringsAsFactors=F, header=F)
# The first column contains dates.
# I use strptime to tell it what format these appear in.
sec1_dates <- strptime(sec1[,1], "%m/%d/%Y %H:%M:%S")
sec2_dates <- strptime(sec2[,1], "%m/%d/%Y %H:%M:%S")
# The second column contains the close prices for the securities.
# I use the zoo function to create zoo objects from that data.
# Input = a vector of data and a vector of dates.
a <- zoo(sec1[,2], sec1_dates)
b <- zoo(sec2[,2], sec2_dates)
# create a discrete time-series with the exact time frame desired
# per tip from James
template <- zoo(NULL, seq(sec1_dates[1], tail(sec1_dates, 1), "min"))
template <- template[which(strftime(time(template),"%H:%M")>"08:30" & strftime(time(template),"%H:%M")<"15:00")]
# The merge function is then used to merge
# 1) each security to the template (uses the discrete date/time range)
# 2) remove the column of data from template (used only for dates)
# 3) each security to one another (this was the ultimate goal anyway.
a.zoo <- merge(a, template, all=TRUE)
a.zoo$template <- NULL
b.zoo <- merge(b, template, all=TRUE)
b.zoo$template <- NULL
t.zoo <- merge(a.zoo, b.zoo, all=TRUE)
# Fill all NA elements with the closest non NA value.
t <- na.locf(t.zoo)
I am attempting to merge two different time-series in R with the following characteristics:
- Data must be between 08:30 and 15:00 on a daily basis.
- Data spans several weeks, not just one particular day.
- There are gaps in the data at random intervals.
- The two datasets will not have gaps at the same intervals necessarily
I would like to merge the two datasets, with all times in the sequence from 08:30 to 15:00 and where there was a gap in each, I would like the previous value (or following value) carried over.
# I have verified that the csv files are imported correctly
# The first column contains dates. and the strptime
# function can convert strings into Date/Time objects.
#
sec1_dates <- strptime(sec1[,1], "%m/%d/%Y %H:%M:%S")
sec2_dates <- strptime(sec2[,1], "%m/%d/%Y %H:%M:%S")
# The second column contains the close.
# I use the zoo function to create zoo objects from that data.
# But for some reason this ends up creating duplicates PROBLEM 1
#
a <- zoo(sec1[,2], sec1_dates)
b <- zoo(sec2[,2], sec2_dates)
# I know that I need use seq to fill in gaps but I am clueless as to how
# Once I have the proper seq I can just use na.locf to fill the appropriate values
# HOWEVER seq(start(sec1_dates), end(sec1_dates), "min") would end up returning
# every minute for each day, and I only want 08:30 to 15:30. PROBLEM 2
# The merge function can combine two zoo objects, in union
# Obviously this fails because the two index sizes don't match PROBLEM 3
#
t.zoo <- merge(a, b, all=TRUE)
James you were right about Problem 1. Thank you. I verified that the csv file was pulling the data in twice and removing the data fixed the issue. I used your solution for Problem 2 as well, but I am not certain that this is the most efficient way of going about doing what I'm trying to do. Ultimately I may want to use this to run regressions, and at that point might need a loop of some sort to pull any number of datasets. Any optimizations that I might make would be greatly appreciated.
UPDATED SOLUTION
library(zoo)
library(tseries)
# Read the CSV files into data frames
sec1 <- read.csv("C:\\exportdata\\sec1.csv", stringsAsFactors=F, header=F)
sec2 <- read.csv("C:\\exportdata\\sec2.csv", stringsAsFactors=F, header=F)
# The first column contains dates.
# I use strptime to tell it what format these appear in.
sec1_dates <- strptime(sec1[,1], "%m/%d/%Y %H:%M:%S")
sec2_dates <- strptime(sec2[,1], "%m/%d/%Y %H:%M:%S")
# The second column contains the close prices for the securities.
# I use the zoo function to create zoo objects from that data.
# Input = a vector of data and a vector of dates.
a <- zoo(sec1[,2], sec1_dates)
b <- zoo(sec2[,2], sec2_dates)
# create a discrete time-series with the exact time frame desired
# per tip from James
template <- zoo(NULL, seq(sec1_dates[1], tail(sec1_dates, 1), "min"))
template <- template[which(strftime(time(template),"%H:%M")>"08:30" & strftime(time(template),"%H:%M")<"15:00")]
# The merge function is then used to merge
# 1) each security to the template (uses the discrete date/time range)
# 2) remove the column of data from template (used only for dates)
# 3) each security to one another (this was the ultimate goal anyway.
a.zoo <- merge(a, template, all=TRUE)
a.zoo$template <- NULL
b.zoo <- merge(b, template, all=TRUE)
b.zoo$template <- NULL
t.zoo <- merge(a.zoo, b.zoo, all=TRUE)
# Fill all NA elements with the closest non NA value.
t <- na.locf(t.zoo)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
问题 1
?zoo
详细介绍了如何处理重复项,但这可能是因为strptime
创建的日期中有重复项。问题 2
您可以使用
[
、which
和time
以及zoo
对象来对时间进行子集化,参见?zoo
,例如:PROBLEM 3
使用
c
组合:t.zoo <- c(a,b)
PROBLEM 1
?zoo
has details on how to deal with duplicates, but this is presumably because you have duplicates in your dates created bystrptime
.PROBLEM 2
You can subset times using
[
,which
andtime
withzoo
objects, see?zoo
, eg:PROBLEM 3
Use
c
to combine:t.zoo <- c(a,b)