当前位置：文江博客话题详情

r aggregation zoo

按时钟时间聚合带时间戳的动物园对象（即不仅仅是按动物园对象中的时间）

发布于 2025-01-03 11:15:44 字数 736 浏览 2 评论 0 原文

我有一个动物园对象，其中包含带时间戳的（到第二个）时间序列。时间序列是不规则的，因为值之间的时间间隔不是规则间隔的。

我想将不规则间隔的 timeseries 对象转换为规则间隔的对象，其中值之间的时间间隔是一个常数 - 比如说 15 分钟，并且是“现实世界”时钟时间。

一些示例数据可能有助于进一步说明

# Sample data
2011-05-05 09:30:04 101.32
2011-05-05 09:30:14 100.09
2011-05-05 09:30:19 99.89
2011-05-05 09:30:35 89.66
2011-05-05 09:30:45 95.16
2011-05-05 09:31:12 100.28
2011-05-05 09:31:50 100.28
2011-05-05 09:32:10 98.28

我想在每个指定的时间段（例如 30 秒时间段）聚合它们（使用我的自定义函数），以便输出如下表所示。

关键是我想按时钟时间每 30 秒聚合一次，而不是从我第一次观察时间开始的 30 秒。当然，第一个时间段将是我在要聚合的数据中记录观察结果（即行）的第一个时间段。

2011-05-05 09:30:00   101.32
2011-05-05 09:30:30   89.66
2011-05-05 09:31:00   100.28

在给出的示例中，我的自定义聚合函数仅返回要聚合的“选定行”的“集合”中的第一个值。

原文

I have a zoo object which consists of a timestamped (to the second) timeseries. The timeseries is irregular in that the time intervals between the values are not regularly spaced.

I would like to transform the irregularly spaced timeseries object into a regularly spaced one, where the time intervals between values is a constant - say 15 minutes, and are "real world" clock times.

Some sample data may help illustrate further

# Sample data
2011-05-05 09:30:04 101.32
2011-05-05 09:30:14 100.09
2011-05-05 09:30:19 99.89
2011-05-05 09:30:35 89.66
2011-05-05 09:30:45 95.16
2011-05-05 09:31:12 100.28
2011-05-05 09:31:50 100.28
2011-05-05 09:32:10 98.28

I'd like to aggregate them (using my custom function) for every specified time period (e.g. 30 second time bucket) such that the output looks like the table presented below.

The key is that I want to aggregate every 30 seconds by clock time NOT 30 seconds starting from my first observation time. Naturally, the first time bucket would be the first time bucket for which I have a recorded observation (i.e. row) in the data to be aggregated.

2011-05-05 09:30:00   101.32
2011-05-05 09:30:30   89.66
2011-05-05 09:31:00   100.28

In the example given, my custom aggregate function simply returns the first value in the 'set' of 'selected rows' to aggregate over.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

时间海 2025-01-10 11:15:44

读入数据，然后按分钟聚合：

Lines <- "2011-05-05 09:30:04 101.32
2011-05-05 09:30:14 100.09
2011-05-05 09:30:19 99.89
2011-05-05 09:30:35 89.66
2011-05-05 09:30:45 95.16
2011-05-05 09:31:12 100.28
2011-05-05 09:31:50 100.28
2011-05-05 09:32:10 98.28"

library(zoo)
library(chron)
toChron <- function(d, t) as.chron(paste(d, t))
z <- read.zoo(text = Lines, index = 1:2, FUN = toChron)
aggregate(z, trunc(time(z), "00:01:00"), mean)

结果为：

(05/05/11 09:30:00) (05/05/11 09:31:00) (05/05/11 09:32:00) 
             97.224             100.280              98.280

Read in the data and then aggregate it by minute:

Lines <- "2011-05-05 09:30:04 101.32
2011-05-05 09:30:14 100.09
2011-05-05 09:30:19 99.89
2011-05-05 09:30:35 89.66
2011-05-05 09:30:45 95.16
2011-05-05 09:31:12 100.28
2011-05-05 09:31:50 100.28
2011-05-05 09:32:10 98.28"

library(zoo)
library(chron)
toChron <- function(d, t) as.chron(paste(d, t))
z <- read.zoo(text = Lines, index = 1:2, FUN = toChron)
aggregate(z, trunc(time(z), "00:01:00"), mean)

The result is:

(05/05/11 09:30:00) (05/05/11 09:31:00) (05/05/11 09:32:00) 
             97.224             100.280              98.280

回复收藏 0 原文

想你只要分分秒秒 2025-01-10 11:15:44

我希望我们可以假设这是在动物园或 xts 对象中。如果是这样，请尝试以下操作：

  # First get a start for a set of intervals, need to use your tz
beg<- as.POSIXct( format(index(dat[1,]), "%Y-%m-%d %H:%M", tz="EST5EDT"))
  # Then create a sequence of 30 second intervals
tseq <- beg+seq(0,4*30, by=30)
  # Then this will creat a vector than you can use for your aggregation fun
findInterval(index(dat), tseq)
  #[1] 1 1 1 2 2 3 4 5
  # To find the first row in a subset of rows from tapply, try "[" with 1
tapply(dat, findInterval(index(dat), tseq), "[", 1)
  #     1      2      3      4      5 
  #101.32  89.66 100.28 100.28  98.28

I hope we can assume this is in a zoo or xts object. If so then try this:

  # First get a start for a set of intervals, need to use your tz
beg<- as.POSIXct( format(index(dat[1,]), "%Y-%m-%d %H:%M", tz="EST5EDT"))
  # Then create a sequence of 30 second intervals
tseq <- beg+seq(0,4*30, by=30)
  # Then this will creat a vector than you can use for your aggregation fun
findInterval(index(dat), tseq)
  #[1] 1 1 1 2 2 3 4 5
  # To find the first row in a subset of rows from tapply, try "[" with 1
tapply(dat, findInterval(index(dat), tseq), "[", 1)
  #     1      2      3      4      5 
  #101.32  89.66 100.28 100.28  98.28

回复收藏 0 原文

绝影如岚 2025-01-10 11:15:44

我只是根据您的时间间隔截断时间，因此假设 t 是时间（如果不是，则使用 as.POSIXct），

bucket = t - as.numeric(t) %% 30

那么您可以聚合 bucket，如aggregate(value, list(bucket), sum)

（我不使用zoo，所以这是纯R）

I would simply truncate the times towards your interval, so assuming t is the time (use as.POSIXct if it's not)

bucket = t - as.numeric(t) %% 30

then you can aggregate over bucket, like aggregate(value, list(bucket), sum)

(I don't use zoo so this is with pure R)

回复收藏 0 原文

凉栀 2025-01-10 11:15:44

您应该查看 xts 中的 align.time。它所做的事情非常接近您想要实现的目标。

my.data <- read.table(text="date,x
2011-05-05 09:30:04,101.32
2011-05-05 09:30:14,100.09
2011-05-05 09:30:19,99.89
2011-05-05 09:30:35,89.66
2011-05-05 09:30:45,95.16
2011-05-05 09:31:12,100.28
2011-05-05 09:31:50,100.28
2011-05-05 09:32:10,98.28", header=TRUE, as.is=TRUE,sep = ",")

my.data <- xts(my.data[,2],as.POSIXlt(my.data[,1],format="%Y-%m-%d %H:%M:%S"))

library(xts)
res <-align.time(my.data,30)
res[!duplicated(index(res)),]

                      [,1]
2011-05-05 09:30:30 101.32
2011-05-05 09:31:00  89.66
2011-05-05 09:31:30 100.28
2011-05-05 09:32:00 100.28
2011-05-05 09:32:30  98.28

如果可以使解释更清晰，您可以将时间序列滞后 30 秒。

You should look at align.time in xts. It does something very close to what you want to achieve.

my.data <- read.table(text="date,x
2011-05-05 09:30:04,101.32
2011-05-05 09:30:14,100.09
2011-05-05 09:30:19,99.89
2011-05-05 09:30:35,89.66
2011-05-05 09:30:45,95.16
2011-05-05 09:31:12,100.28
2011-05-05 09:31:50,100.28
2011-05-05 09:32:10,98.28", header=TRUE, as.is=TRUE,sep = ",")

my.data <- xts(my.data[,2],as.POSIXlt(my.data[,1],format="%Y-%m-%d %H:%M:%S"))

library(xts)
res <-align.time(my.data,30)
res[!duplicated(index(res)),]

                      [,1]
2011-05-05 09:30:30 101.32
2011-05-05 09:31:00  89.66
2011-05-05 09:31:30 100.28
2011-05-05 09:32:00 100.28
2011-05-05 09:32:30  98.28

You can lag the time series by 30 seconds if it makes the interpretation clearer.

回复收藏 0 原文

~没有更多了~