使用 XTS 查找早于某个时间戳的最新观察结果

发布于 2024-12-16 16:47:32 字数 2620 浏览 0 评论 0原文

我有一个 xts 对象,如下所示:

> q.xts
                                  val
2011-08-31 09:30:00.002357 -1.0135222
2011-08-31 09:30:00.003443 -0.2182679
2011-08-31 09:30:00.005075 -0.5317191
2011-08-31 09:30:00.009515 -1.0639535
2011-08-31 09:30:00.011569 -1.2470759
2011-08-31 09:30:00.012144  0.7678103
2011-08-31 09:30:00.023813 -0.6303432
2011-08-31 09:30:00.024107 -0.5105943

我根据另一个数据帧 r 中的时间戳计算固定偏移量。 r 中的行数明显少于 q.xts 中的行数。

> r
                        time               predict.time
1 2011-08-31 09:30:00.003443 2011-08-31 09:30:00.002443
2 2011-08-31 09:30:00.009515 2011-08-31 09:30:00.008515
3 2011-08-31 09:30:00.024107 2011-08-31 09:30:00.023108

time 列对应于 q.xts 的观察结果,而 predict.time 列比 time 早 1 毫秒>(减去任何精度舍入)。

我想要做的是从 q.xts 中找到等于或早于 predict.time 每个值的时间的最后一个观察结果。对于上面 r 中的三个观察结果,我期望出现以下匹配:

                        time               predict.time     (time from q.xts)
1 2011-08-31 09:30:00.003443 2011-08-31 09:30:00.002443  --> 09:30:00.002357
2 2011-08-31 09:30:00.009515 2011-08-31 09:30:00.008515  --> 09:30:00.005075
3 2011-08-31 09:30:00.024107 2011-08-31 09:30:00.023108  --> 09:30:00.012144

我通过循环 r 中的每一行并执行 xts 子集 来实现这一点>。因此,对于 r 的第 1 行,我会这样做:

> last(index(q.xts[paste('/', r[1,]$predict.time, sep='')]))
[1] "2011-08-31 09:30:00.002357 CDT"

问题:用循环执行此操作似乎笨拙且尴尬。有更好的办法吗?我希望在 r 中得到另一列,它提供 q.xts 中相应值的准确时间或行号。


注意:使用它来构建我用于此示例的数据:

q <- read.csv(tc <- textConnection("
       2011-08-31 09:30:00.002358, -1.01352216
       2011-08-31 09:30:00.003443, -0.21826793
       2011-08-31 09:30:00.005076, -0.53171913
       2011-08-31 09:30:00.009515, -1.06395353
       2011-08-31 09:30:00.011570, -1.24707591
       2011-08-31 09:30:00.012144,  0.76781028
       2011-08-31 09:30:00.023814, -0.63034317
       2011-08-31 09:30:00.024108, -0.51059425"),
     header=FALSE); close(tc)
colnames(q) <- c('datetime', 'val')
q.xts <- xts(q[-1], as.POSIXct(q$datetime))

r <- read.csv(tc <- textConnection("
       2011-08-31 09:30:00.003443
       2011-08-31 09:30:00.009515
       2011-08-31 09:30:00.024108"),
     header=FALSE); close(tc)
colnames(r) <- c('time')
r$time <- as.POSIXct(strptime(r$time, '%Y-%m-%d %H:%M:%OS'))
r$predict.time <- r$time - 0.001

I have an xts object that looks like this:

> q.xts
                                  val
2011-08-31 09:30:00.002357 -1.0135222
2011-08-31 09:30:00.003443 -0.2182679
2011-08-31 09:30:00.005075 -0.5317191
2011-08-31 09:30:00.009515 -1.0639535
2011-08-31 09:30:00.011569 -1.2470759
2011-08-31 09:30:00.012144  0.7678103
2011-08-31 09:30:00.023813 -0.6303432
2011-08-31 09:30:00.024107 -0.5105943

I calculate a fixed offset from timestamps in another data frame, r. The number of rows in r is significantly fewer than the number of rows in q.xts.

> r
                        time               predict.time
1 2011-08-31 09:30:00.003443 2011-08-31 09:30:00.002443
2 2011-08-31 09:30:00.009515 2011-08-31 09:30:00.008515
3 2011-08-31 09:30:00.024107 2011-08-31 09:30:00.023108

The time column corresponds to an observation from q.xts while the predict.time column is 1 millisecond earlier than time (less any precision round offs).

What I would like to do is find the last observation from q.xts that is equal to or earlier in time for each value of predict.time. For the three observations in r above I would expect the following matches:

                        time               predict.time     (time from q.xts)
1 2011-08-31 09:30:00.003443 2011-08-31 09:30:00.002443  --> 09:30:00.002357
2 2011-08-31 09:30:00.009515 2011-08-31 09:30:00.008515  --> 09:30:00.005075
3 2011-08-31 09:30:00.024107 2011-08-31 09:30:00.023108  --> 09:30:00.012144

I had approached this by looping over each row in r and performing an xts subset. So, for row 1 of r I would do:

> last(index(q.xts[paste('/', r[1,]$predict.time, sep='')]))
[1] "2011-08-31 09:30:00.002357 CDT"

QUESTION: Doing this with a loop seems clunky and awkward. Is there a better way? I would like to end up with another column in r that provides the exact time or row number for the corresponding value in q.xts.


NOTE: Use this to build the data I've used for this example:

q <- read.csv(tc <- textConnection("
       2011-08-31 09:30:00.002358, -1.01352216
       2011-08-31 09:30:00.003443, -0.21826793
       2011-08-31 09:30:00.005076, -0.53171913
       2011-08-31 09:30:00.009515, -1.06395353
       2011-08-31 09:30:00.011570, -1.24707591
       2011-08-31 09:30:00.012144,  0.76781028
       2011-08-31 09:30:00.023814, -0.63034317
       2011-08-31 09:30:00.024108, -0.51059425"),
     header=FALSE); close(tc)
colnames(q) <- c('datetime', 'val')
q.xts <- xts(q[-1], as.POSIXct(q$datetime))

r <- read.csv(tc <- textConnection("
       2011-08-31 09:30:00.003443
       2011-08-31 09:30:00.009515
       2011-08-31 09:30:00.024108"),
     header=FALSE); close(tc)
colnames(r) <- c('time')
r$time <- as.POSIXct(strptime(r$time, '%Y-%m-%d %H:%M:%OS'))
r$predict.time <- r$time - 0.001

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

晨光如昨 2024-12-23 16:47:32

可能有更好的方法来做到这一点,但这是我目前能想到的最好的方法。

# create an empty xts object based on r$predict.time
r.xts <- xts(,r$predict.time)
# merge q.xts and r.xts. This will insert NAs at the times in r.xts.
tmp <- merge(q.xts,r.xts)
# Here's the magic:
# lag tmp *backwards* one period, so the NAs appear at the times
# right before the times in r.xts. Then grab the index for the NA periods
tmp.index <- index(tmp[is.na(lag(tmp,-1,na.pad=FALSE))])
# get the rows in q.xts for the times in tmp.index
out <- q.xts[tmp.index]
#                                   val
# 2011-08-31 09:30:00.002357 -1.0135222
# 2011-08-31 09:30:00.005075 -0.5317191
# 2011-08-31 09:30:00.012144  0.7678103

There may be a better way to do this, but this is the best I can come up with at the moment.

# create an empty xts object based on r$predict.time
r.xts <- xts(,r$predict.time)
# merge q.xts and r.xts. This will insert NAs at the times in r.xts.
tmp <- merge(q.xts,r.xts)
# Here's the magic:
# lag tmp *backwards* one period, so the NAs appear at the times
# right before the times in r.xts. Then grab the index for the NA periods
tmp.index <- index(tmp[is.na(lag(tmp,-1,na.pad=FALSE))])
# get the rows in q.xts for the times in tmp.index
out <- q.xts[tmp.index]
#                                   val
# 2011-08-31 09:30:00.002357 -1.0135222
# 2011-08-31 09:30:00.005075 -0.5317191
# 2011-08-31 09:30:00.012144  0.7678103
人│生佛魔见 2024-12-23 16:47:32

我会使用 findInterval

findInterval(r$predict.time, index(q.xts))

> q.xts[findInterval(r$predict.time, index(q.xts)),]
                           val
2011-08-31 09:30:00 -1.0135222
2011-08-31 09:30:00 -0.5317191
2011-08-31 09:30:00  0.7678103

你的时间是 POSIXct 所以这应该相当稳健。

I'd use findInterval:

findInterval(r$predict.time, index(q.xts))

> q.xts[findInterval(r$predict.time, index(q.xts)),]
                           val
2011-08-31 09:30:00 -1.0135222
2011-08-31 09:30:00 -0.5317191
2011-08-31 09:30:00  0.7678103

Your times are POSIXct so this should be fairly robust.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文