R时间序列，复杂序列

发布于 2024-10-23 21:42:20 字数 2992 浏览 9 评论 0原文

我试图在 R 中合并具有以下特征的两个不同时间序列：

数据必须在每天的 08:30 到 15:00 之间。
数据跨越几周，而不仅仅是某一天。
数据中随机间隔存在间隙。
这两个数据集不一定有相同间隔的间隙

我想合并这两个数据集，所有时间都按从 08:30 到 15:00 的顺序排列，并且每个数据集都有间隙，我想要以前的值（或以下值）结转。

# I have verified that the csv files are imported correctly
# The first column contains dates. and the strptime
# function can convert strings into Date/Time objects.
#
sec1_dates <- strptime(sec1[,1], "%m/%d/%Y %H:%M:%S")
sec2_dates <- strptime(sec2[,1], "%m/%d/%Y %H:%M:%S")

# The second column contains the close.
# I use the zoo function to create zoo objects from that data.
# But for some reason this ends up creating duplicates PROBLEM 1
#
a <- zoo(sec1[,2], sec1_dates)
b <- zoo(sec2[,2], sec2_dates)

# I know that I need use seq to fill in gaps but I am clueless as to how
# Once I have the proper seq I can just use na.locf to fill the appropriate values
# HOWEVER seq(start(sec1_dates), end(sec1_dates), "min") would end up returning
# every minute for each day, and I only want 08:30 to 15:30. PROBLEM 2

# The merge function can combine two zoo objects, in union
# Obviously this fails because the two index sizes don't match PROBLEM 3
#
t.zoo <- merge(a, b, all=TRUE)

詹姆斯，您对问题 1 的看法是正确的。谢谢您。我验证了 csv 文件两次提取数据并删除数据解决了问题。我也使用了你的解决方案来解决问题 2，但我不确定这是做我想做的事情的最有效的方法。最终我可能想用它来运行回归，此时可能需要某种循环来提取任意数量的数据集。我可能做出的任何优化将不胜感激。

更新的解决方案

library(zoo)
library(tseries)

# Read the CSV files into data frames
sec1 <- read.csv("C:\\exportdata\\sec1.csv", stringsAsFactors=F, header=F)
sec2 <- read.csv("C:\\exportdata\\sec2.csv", stringsAsFactors=F, header=F)

# The first column contains dates.  
# I use strptime to tell it what format these appear in.
sec1_dates <- strptime(sec1[,1], "%m/%d/%Y %H:%M:%S")
sec2_dates <- strptime(sec2[,1], "%m/%d/%Y %H:%M:%S")

# The second column contains the close prices for the securities.
# I use the zoo function to create zoo objects from that data.
# Input =  a vector of data and a vector of dates.
a <- zoo(sec1[,2], sec1_dates)
b <- zoo(sec2[,2], sec2_dates)

# create a discrete time-series with the exact time frame desired
# per tip from James
template <- zoo(NULL, seq(sec1_dates[1], tail(sec1_dates, 1), "min"))
template <- template[which(strftime(time(template),"%H:%M")>"08:30" & strftime(time(template),"%H:%M")<"15:00")]

# The merge function is then used to merge
# 1) each security to the template (uses the discrete date/time range)
# 2) remove the column of data from template (used only for dates)
# 3) each security to one another (this was the ultimate goal anyway.
a.zoo <- merge(a, template, all=TRUE)
a.zoo$template <- NULL
b.zoo <- merge(b, template, all=TRUE)
b.zoo$template <- NULL
t.zoo <- merge(a.zoo, b.zoo, all=TRUE)

# Fill all NA elements with the closest non NA value.
t <- na.locf(t.zoo)

原文

I am attempting to merge two different time-series in R with the following characteristics:

Data must be between 08:30 and 15:00 on a daily basis.
Data spans several weeks, not just one particular day.
There are gaps in the data at random intervals.
The two datasets will not have gaps at the same intervals necessarily

I would like to merge the two datasets, with all times in the sequence from 08:30 to 15:00 and where there was a gap in each, I would like the previous value (or following value) carried over.

# I have verified that the csv files are imported correctly
# The first column contains dates. and the strptime
# function can convert strings into Date/Time objects.
#
sec1_dates <- strptime(sec1[,1], "%m/%d/%Y %H:%M:%S")
sec2_dates <- strptime(sec2[,1], "%m/%d/%Y %H:%M:%S")

# The second column contains the close.
# I use the zoo function to create zoo objects from that data.
# But for some reason this ends up creating duplicates PROBLEM 1
#
a <- zoo(sec1[,2], sec1_dates)
b <- zoo(sec2[,2], sec2_dates)

# I know that I need use seq to fill in gaps but I am clueless as to how
# Once I have the proper seq I can just use na.locf to fill the appropriate values
# HOWEVER seq(start(sec1_dates), end(sec1_dates), "min") would end up returning
# every minute for each day, and I only want 08:30 to 15:30. PROBLEM 2

# The merge function can combine two zoo objects, in union
# Obviously this fails because the two index sizes don't match PROBLEM 3
#
t.zoo <- merge(a, b, all=TRUE)

James you were right about Problem 1. Thank you. I verified that the csv file was pulling the data in twice and removing the data fixed the issue. I used your solution for Problem 2 as well, but I am not certain that this is the most efficient way of going about doing what I'm trying to do. Ultimately I may want to use this to run regressions, and at that point might need a loop of some sort to pull any number of datasets. Any optimizations that I might make would be greatly appreciated.

UPDATED SOLUTION

library(zoo)
library(tseries)

# Read the CSV files into data frames
sec1 <- read.csv("C:\\exportdata\\sec1.csv", stringsAsFactors=F, header=F)
sec2 <- read.csv("C:\\exportdata\\sec2.csv", stringsAsFactors=F, header=F)

# The first column contains dates.  
# I use strptime to tell it what format these appear in.
sec1_dates <- strptime(sec1[,1], "%m/%d/%Y %H:%M:%S")
sec2_dates <- strptime(sec2[,1], "%m/%d/%Y %H:%M:%S")

# The second column contains the close prices for the securities.
# I use the zoo function to create zoo objects from that data.
# Input =  a vector of data and a vector of dates.
a <- zoo(sec1[,2], sec1_dates)
b <- zoo(sec2[,2], sec2_dates)

# create a discrete time-series with the exact time frame desired
# per tip from James
template <- zoo(NULL, seq(sec1_dates[1], tail(sec1_dates, 1), "min"))
template <- template[which(strftime(time(template),"%H:%M")>"08:30" & strftime(time(template),"%H:%M")<"15:00")]

# The merge function is then used to merge
# 1) each security to the template (uses the discrete date/time range)
# 2) remove the column of data from template (used only for dates)
# 3) each security to one another (this was the ultimate goal anyway.
a.zoo <- merge(a, template, all=TRUE)
a.zoo$template <- NULL
b.zoo <- merge(b, template, all=TRUE)
b.zoo$template <- NULL
t.zoo <- merge(a.zoo, b.zoo, all=TRUE)

# Fill all NA elements with the closest non NA value.
t <- na.locf(t.zoo)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

风追烟花雨 2024-10-30 21:42:20

问题 1

?zoo 详细介绍了如何处理重复项，但这可能是因为 strptime 创建的日期中有重复项。

问题 2

您可以使用 [、which 和 time 以及 zoo 对象来对时间进行子集化，参见?zoo，例如：

t.zoo[which(strftime(time(t.zoo),"%H:%M")>"08:30" & strftime(time(t.zoo),"%H:%M")<"15:30")]

PROBLEM 3

使用c组合：t.zoo <- c(a,b)

PROBLEM 1

?zoo has details on how to deal with duplicates, but this is presumably because you have duplicates in your dates created by strptime.

PROBLEM 2

You can subset times using [, which and time with zoo objects, see ?zoo, eg:

t.zoo[which(strftime(time(t.zoo),"%H:%M")>"08:30" & strftime(time(t.zoo),"%H:%M")<"15:30")]

PROBLEM 3

Use c to combine: t.zoo <- c(a,b)

回复收藏 0 原文

~没有更多了~

关于作者

再见回来

暂无简介

文章

25 人气

关注发私信

tomoekana

文章 0 评论 0

关注

无边思念无边月

文章 0 评论 0

关注

眼角的笑意。

文章 0 评论 0

关注

在风中等你

文章 0 评论 0

关注

是你

文章 0 评论 0

关注

syong71

文章 0 评论 0

友情链接

文江博客

R时间序列，复杂序列

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

tomoekana

无边思念无边月

眼角的笑意。

在风中等你

是你

syong71

友情链接

R时间序列，复杂序列

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

tomoekana

无边思念无边月

眼角的笑意。

在风中等你

是你

syong71

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。