R时间序列,复杂序列

发布于 2024-10-23 21:42:20 字数 2992 浏览 2 评论 0原文

我试图在 R 中合并具有以下特征的两个不同时间序列:

  1. 数据必须在每天的 08:30 到 15:00 之间。
  2. 数据跨越几周,而不仅仅是某一天。
  3. 数据中随机间隔存在间隙。
  4. 这两个数据集不一定有相同间隔的间隙

我想合并这两个数据集,所有时间都按从 08:30 到 15:00 的顺序排列,并且每个数据集都有间隙,我想要以前的值(或以下值)结转。

# I have verified that the csv files are imported correctly
# The first column contains dates. and the strptime
# function can convert strings into Date/Time objects.
#
sec1_dates <- strptime(sec1[,1], "%m/%d/%Y %H:%M:%S")
sec2_dates <- strptime(sec2[,1], "%m/%d/%Y %H:%M:%S")

# The second column contains the close.
# I use the zoo function to create zoo objects from that data.
# But for some reason this ends up creating duplicates PROBLEM 1
#
a <- zoo(sec1[,2], sec1_dates)
b <- zoo(sec2[,2], sec2_dates)

# I know that I need use seq to fill in gaps but I am clueless as to how
# Once I have the proper seq I can just use na.locf to fill the appropriate values
# HOWEVER seq(start(sec1_dates), end(sec1_dates), "min") would end up returning
# every minute for each day, and I only want 08:30 to 15:30. PROBLEM 2

# The merge function can combine two zoo objects, in union
# Obviously this fails because the two index sizes don't match PROBLEM 3
#
t.zoo <- merge(a, b, all=TRUE)

詹姆斯,您对问题 1 的看法是正确的。谢谢您。我验证了 csv 文件两次提取数据并删除数据解决了问题。我也使用了你的解决方案来解决问题 2,但我不确定这是做我想做的事情的最有效的方法。最终我可能想用它来运行回归,此时可能需要某种循环来提取任意数量的数据集。我可能做出的任何优化将不胜感激。

更新的解决方案

library(zoo)
library(tseries)

# Read the CSV files into data frames
sec1 <- read.csv("C:\\exportdata\\sec1.csv", stringsAsFactors=F, header=F)
sec2 <- read.csv("C:\\exportdata\\sec2.csv", stringsAsFactors=F, header=F)

# The first column contains dates.  
# I use strptime to tell it what format these appear in.
sec1_dates <- strptime(sec1[,1], "%m/%d/%Y %H:%M:%S")
sec2_dates <- strptime(sec2[,1], "%m/%d/%Y %H:%M:%S")

# The second column contains the close prices for the securities.
# I use the zoo function to create zoo objects from that data.
# Input =  a vector of data and a vector of dates.
a <- zoo(sec1[,2], sec1_dates)
b <- zoo(sec2[,2], sec2_dates)

# create a discrete time-series with the exact time frame desired
# per tip from James
template <- zoo(NULL, seq(sec1_dates[1], tail(sec1_dates, 1), "min"))
template <- template[which(strftime(time(template),"%H:%M")>"08:30" & strftime(time(template),"%H:%M")<"15:00")]

# The merge function is then used to merge
# 1) each security to the template (uses the discrete date/time range)
# 2) remove the column of data from template (used only for dates)
# 3) each security to one another (this was the ultimate goal anyway.
a.zoo <- merge(a, template, all=TRUE)
a.zoo$template <- NULL
b.zoo <- merge(b, template, all=TRUE)
b.zoo$template <- NULL
t.zoo <- merge(a.zoo, b.zoo, all=TRUE)

# Fill all NA elements with the closest non NA value.
t <- na.locf(t.zoo)

I am attempting to merge two different time-series in R with the following characteristics:

  1. Data must be between 08:30 and 15:00 on a daily basis.
  2. Data spans several weeks, not just one particular day.
  3. There are gaps in the data at random intervals.
  4. The two datasets will not have gaps at the same intervals necessarily

I would like to merge the two datasets, with all times in the sequence from 08:30 to 15:00 and where there was a gap in each, I would like the previous value (or following value) carried over.

# I have verified that the csv files are imported correctly
# The first column contains dates. and the strptime
# function can convert strings into Date/Time objects.
#
sec1_dates <- strptime(sec1[,1], "%m/%d/%Y %H:%M:%S")
sec2_dates <- strptime(sec2[,1], "%m/%d/%Y %H:%M:%S")

# The second column contains the close.
# I use the zoo function to create zoo objects from that data.
# But for some reason this ends up creating duplicates PROBLEM 1
#
a <- zoo(sec1[,2], sec1_dates)
b <- zoo(sec2[,2], sec2_dates)

# I know that I need use seq to fill in gaps but I am clueless as to how
# Once I have the proper seq I can just use na.locf to fill the appropriate values
# HOWEVER seq(start(sec1_dates), end(sec1_dates), "min") would end up returning
# every minute for each day, and I only want 08:30 to 15:30. PROBLEM 2

# The merge function can combine two zoo objects, in union
# Obviously this fails because the two index sizes don't match PROBLEM 3
#
t.zoo <- merge(a, b, all=TRUE)

James you were right about Problem 1. Thank you. I verified that the csv file was pulling the data in twice and removing the data fixed the issue. I used your solution for Problem 2 as well, but I am not certain that this is the most efficient way of going about doing what I'm trying to do. Ultimately I may want to use this to run regressions, and at that point might need a loop of some sort to pull any number of datasets. Any optimizations that I might make would be greatly appreciated.

UPDATED SOLUTION

library(zoo)
library(tseries)

# Read the CSV files into data frames
sec1 <- read.csv("C:\\exportdata\\sec1.csv", stringsAsFactors=F, header=F)
sec2 <- read.csv("C:\\exportdata\\sec2.csv", stringsAsFactors=F, header=F)

# The first column contains dates.  
# I use strptime to tell it what format these appear in.
sec1_dates <- strptime(sec1[,1], "%m/%d/%Y %H:%M:%S")
sec2_dates <- strptime(sec2[,1], "%m/%d/%Y %H:%M:%S")

# The second column contains the close prices for the securities.
# I use the zoo function to create zoo objects from that data.
# Input =  a vector of data and a vector of dates.
a <- zoo(sec1[,2], sec1_dates)
b <- zoo(sec2[,2], sec2_dates)

# create a discrete time-series with the exact time frame desired
# per tip from James
template <- zoo(NULL, seq(sec1_dates[1], tail(sec1_dates, 1), "min"))
template <- template[which(strftime(time(template),"%H:%M")>"08:30" & strftime(time(template),"%H:%M")<"15:00")]

# The merge function is then used to merge
# 1) each security to the template (uses the discrete date/time range)
# 2) remove the column of data from template (used only for dates)
# 3) each security to one another (this was the ultimate goal anyway.
a.zoo <- merge(a, template, all=TRUE)
a.zoo$template <- NULL
b.zoo <- merge(b, template, all=TRUE)
b.zoo$template <- NULL
t.zoo <- merge(a.zoo, b.zoo, all=TRUE)

# Fill all NA elements with the closest non NA value.
t <- na.locf(t.zoo)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

风追烟花雨 2024-10-30 21:42:20

问题 1

?zoo 详细介绍了如何处理重复项,但这可能是因为 strptime 创建的日期中有重复项。

问题 2

您可以使用 [whichtime 以及 zoo 对象来对时间进行子集化,参见?zoo,例如:

t.zoo[which(strftime(time(t.zoo),"%H:%M")>"08:30" & strftime(time(t.zoo),"%H:%M")<"15:30")]

PROBLEM 3

使用c组合:t.zoo <- c(a,b)

PROBLEM 1

?zoo has details on how to deal with duplicates, but this is presumably because you have duplicates in your dates created by strptime.

PROBLEM 2

You can subset times using [, which and time with zoo objects, see ?zoo, eg:

t.zoo[which(strftime(time(t.zoo),"%H:%M")>"08:30" & strftime(time(t.zoo),"%H:%M")<"15:30")]

PROBLEM 3

Use c to combine: t.zoo <- c(a,b)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文