R/zoo:处理非唯一索引条目但不丢失数据?

发布于 2024-12-22 04:38:44 字数 994 浏览 2 评论 0原文

我有一个数据点的 csv 文件(例如金融报价、实验记录等),并且我的数据具有重复的时间戳。下面是演示该问题的代码:

library(zoo);library(xts)

csv="2011-11-01,50
2011-11-02,49
2011-11-02,48
2011-11-03,47
2011-11-03,46
2011-11-03,45
2011-11-04,44
2011-11-04,43
2011-11-04,42
2011-11-04,41
"

z1=read.zoo(textConnection(csv),sep=',')
w1=to.weekly(z1)
ep=endpoints(z1,"weeks",1)
w1$Volume=period.apply(z1,ep,length)

z2=read.zoo(textConnection(csv),sep=',',aggregate=T)
w2=to.weekly(z2)
ep=endpoints(z2,"weeks",1)
w2$Volume=period.apply(z2,ep,length)

vignette('zoo-faq'),条目 1,告诉我aggregate=T 摆脱了动物园烦人的警告消息。但随后结果发生了变化:

> w1
           z1.Open z1.High z1.Low z1.Close Volume
2011-11-04      50      50     41       41     10
> w2
           z2.Open z2.High z2.Low z2.Close Volume
2011-11-04      50      50   42.5     42.5      4

是否有另一种方法可以消除警告消息,但仍然得到与 w1 相同的结果? (是的,我知道suppressWarnings(),这是我之前使用的,但我讨厌这个想法。) (我想知道是否将自定义聚合函数传递给 read.zoo,该函数将返回每天的 OHLCV 数据......但甚至无法确定这是否可能。)

I've a csv file of data points (e.g. financial ticks, experiment recordings, etc.), and my data has duplicate timestamps. Here is code demonstrating the problem:

library(zoo);library(xts)

csv="2011-11-01,50
2011-11-02,49
2011-11-02,48
2011-11-03,47
2011-11-03,46
2011-11-03,45
2011-11-04,44
2011-11-04,43
2011-11-04,42
2011-11-04,41
"

z1=read.zoo(textConnection(csv),sep=',')
w1=to.weekly(z1)
ep=endpoints(z1,"weeks",1)
w1$Volume=period.apply(z1,ep,length)

z2=read.zoo(textConnection(csv),sep=',',aggregate=T)
w2=to.weekly(z2)
ep=endpoints(z2,"weeks",1)
w2$Volume=period.apply(z2,ep,length)

vignette('zoo-faq'), entry 1, tells me aggregate=T gets rid of zoo's annoying warning message. But then the results change:

> w1
           z1.Open z1.High z1.Low z1.Close Volume
2011-11-04      50      50     41       41     10
> w2
           z2.Open z2.High z2.Low z2.Close Volume
2011-11-04      50      50   42.5     42.5      4

Is there another way to get rid of the warning message but still get the same results as w1? (Yes, I know about suppressWarnings(), which is what I was using before, but I hate the idea.)
(I was wondering about passing a custom aggregate function to read.zoo that would return OHLCV data for each day... but couldn't even work out if that was even possible.)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

家住魔仙堡 2024-12-29 04:38:44

您需要一个函数来用“epsilon”增量填充时间戳以使它们
不同的。

我还编写了一两个基于 Rcpp 的函数来执行此操作。毕竟,时间通常是 POSIXct,它实际上是一个浮点数(在执行 as.numeric 之后),因此只需循环时间戳,并在与前一个相等的情况下继续添加 1.0 的小增量e-7 比 POSIXct 本身可以表示的要小。每次实际休息时都会重置累积增量。

编辑:尝试 xts 包中的 make.index.unique()make.time.unique() 函数:

R> sametime <- rep(Sys.time(), 3)
R> xts(1:3, order.by=make.time.unique(sametime))
                           [,1]
2011-12-20 06:52:37.547299    1
2011-12-20 06:52:37.547300    2
2011-12-20 06:52:37.547301    3
R> 

编辑2: 这是 Date 索引对象的另一个示例:

R> samedate <- rep(Sys.Date(), 5)   # identical dates
R> xts(1:5, order.by=make.time.unique(as.POSIXct(samedate)))
                           [,1]
2011-12-19 18:00:00.000000    1
2011-12-19 18:00:00.000000    2
2011-12-19 18:00:00.000001    3
2011-12-19 18:00:00.000002    4
2011-12-19 18:00:00.000003    5
R> xts(1:5, order.by=as.Date(make.index.unique(as.POSIXct(samedate))))
           [,1]
2011-12-20    1
2011-12-20    2
2011-12-20    3
2011-12-20    4
2011-12-20    5
R> 

第一个解决方案切换到 POSIXct,最终在午夜前 6 小时结束,因为 GMT 减去 6 小时是我的本地时区。第二个示例使用双重转换,然后返回到 Date --- 然后使其变得唯一。

You need a function to pad the time stamps with an "epsilon" increment to make them
different.

I have also written one or two Rcpp-based functions to do that. Times are after all most often POSIXct which is really a float (after you do as.numeric), so just loop over the time stamps, and on equality to the previous one keep adding a small delta of 1.0e-7 which is smaller than what POSIXct itself can represent. Reset the cumulative delta each time you have an actual break.

Edit: Try the make.index.unique() and make.time.unique() functions in the xts package:

R> sametime <- rep(Sys.time(), 3)
R> xts(1:3, order.by=make.time.unique(sametime))
                           [,1]
2011-12-20 06:52:37.547299    1
2011-12-20 06:52:37.547300    2
2011-12-20 06:52:37.547301    3
R> 

Edit 2: Here is another example for Date indexed objects:

R> samedate <- rep(Sys.Date(), 5)   # identical dates
R> xts(1:5, order.by=make.time.unique(as.POSIXct(samedate)))
                           [,1]
2011-12-19 18:00:00.000000    1
2011-12-19 18:00:00.000000    2
2011-12-19 18:00:00.000001    3
2011-12-19 18:00:00.000002    4
2011-12-19 18:00:00.000003    5
R> xts(1:5, order.by=as.Date(make.index.unique(as.POSIXct(samedate))))
           [,1]
2011-12-20    1
2011-12-20    2
2011-12-20    3
2011-12-20    4
2011-12-20    5
R> 

The first solution switches to POSIXct, which ends up at six hours before midnight as GMT minus six hours is my local timezone. The second example uses a dual conversion away, and back to Date --- which has then been made unique.

迷途知返 2024-12-29 04:38:44

就像德克建议的一个简单变体一样,这应该可行

z0 = read.csv( textConnection(csv), sep=',', header=FALSE )
z1 = zoo( z0$V2, as.Date(z0$V1) + (1:nrow(z0))*10^-10 )

Just as a simple variant on Dirk's suggestion, this should work

z0 = read.csv( textConnection(csv), sep=',', header=FALSE )
z1 = zoo( z0$V2, as.Date(z0$V1) + (1:nrow(z0))*10^-10 )
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文