从R中的rollapply获取交易日
我有以下 y 列的模拟数据集,其中包含 2018 年的固定交易日(例如 250)。
data
# A tibble: 249 × 2
Date y
<dttm> <dbl>
1 2018-01-02 00:00:00 0.409
2 2018-01-03 00:00:00 -1.90
3 2018-01-04 00:00:00 0.131
4 2018-01-05 00:00:00 -0.619
5 2018-01-08 00:00:00 0.449
6 2018-01-09 00:00:00 0.448
7 2018-01-10 00:00:00 0.124
8 2018-01-11 00:00:00 -0.346
9 2018-01-12 00:00:00 0.775
10 2018-01-15 00:00:00 -0.948
# … with 239 more rows
使用 tail,
> tail(data,n=10)
# A tibble: 10 × 2
Date y
<dttm> <dbl>
1 2018-12-13 00:00:00 -0.00736
2 2018-12-14 00:00:00 -1.30
3 2018-12-17 00:00:00 0.227
4 2018-12-18 00:00:00 -0.671
5 2018-12-19 00:00:00 -0.750
6 2018-12-20 00:00:00 -0.906
7 2018-12-21 00:00:00 -1.74
8 2018-12-27 00:00:00 0.331
9 2018-12-28 00:00:00 -0.768
10 2018-12-31 00:00:00 0.649
我想使用窗口 60 计算 y 列的滚动 sd,然后找到确切的交易日,而不是实际的通常交易日(可以是从指数中完成?我不知道。)
data2 = data%>%
mutate(date = as.Date(Date))
data3=data2[,-1];head(data3)
roll_win = 60
data3$a = c(rep(NA_real_, roll_win - 1), zoo::rollapply(data3$y, roll_win ,sd))
dat = subset(data3, !is.na(a))
dat_max = dat[dat$a == max(dat$a, na.rm = TRUE), ]
dat_max$date_start = dat_max$date - (roll_win - 1)
dat_max
事实证明,高波动期是:
dat_max
# A tibble: 1 × 4
y date a date_start
<dbl> <date> <dbl> <date>
1 0.931 2018-04-24 1.18 2018-02-24
现在,如果我减去这两个日期,我将得到:
> dat_max$date - dat_max$date_start
Time difference of 59 days
这实际上是正确的,但这些不是交易日。
我在此处提出了类似的问题,但是它并没有解决问题。实际上当时提出的问题是我如何才能获得高波动的日子。
我如何获得这个交易日有什么帮助吗?预先感谢
编辑
完整数据
library(gsheet)
data= gsheet2tbl("https://docs.google.com/spreadsheets/d/1PdZDb3OgqSaO6znUWsAh7p_MVLHgNbQM/edit?usp=sharing&ouid=109626011108852110510&rtpof=true&sd=true")
data
I have following simulated dataset of y column with fixed trading days (say 250) of 2018.
data
# A tibble: 249 × 2
Date y
<dttm> <dbl>
1 2018-01-02 00:00:00 0.409
2 2018-01-03 00:00:00 -1.90
3 2018-01-04 00:00:00 0.131
4 2018-01-05 00:00:00 -0.619
5 2018-01-08 00:00:00 0.449
6 2018-01-09 00:00:00 0.448
7 2018-01-10 00:00:00 0.124
8 2018-01-11 00:00:00 -0.346
9 2018-01-12 00:00:00 0.775
10 2018-01-15 00:00:00 -0.948
# … with 239 more rows
with tail
> tail(data,n=10)
# A tibble: 10 × 2
Date y
<dttm> <dbl>
1 2018-12-13 00:00:00 -0.00736
2 2018-12-14 00:00:00 -1.30
3 2018-12-17 00:00:00 0.227
4 2018-12-18 00:00:00 -0.671
5 2018-12-19 00:00:00 -0.750
6 2018-12-20 00:00:00 -0.906
7 2018-12-21 00:00:00 -1.74
8 2018-12-27 00:00:00 0.331
9 2018-12-28 00:00:00 -0.768
10 2018-12-31 00:00:00 0.649
I want to calculate rolling sd of column y with window 60 and then to find the exact trading days not actual-usual days (it can be done from index? I don't know.)
data2 = data%>%
mutate(date = as.Date(Date))
data3=data2[,-1];head(data3)
roll_win = 60
data3$a = c(rep(NA_real_, roll_win - 1), zoo::rollapply(data3$y, roll_win ,sd))
dat = subset(data3, !is.na(a))
dat_max = dat[dat$a == max(dat$a, na.rm = TRUE), ]
dat_max$date_start = dat_max$date - (roll_win - 1)
dat_max
Turn outs that the period of high volatility is :
dat_max
# A tibble: 1 × 4
y date a date_start
<dbl> <date> <dbl> <date>
1 0.931 2018-04-24 1.18 2018-02-24
Now if I subtract the two dates I will have :
> dat_max$date - dat_max$date_start
Time difference of 59 days
Which is actually true but these are NOT THE TRADING DAYS.
I have asked a similar question here but it didn't solved the problem.Actually the asked question then was how I can obtain the days of high volatility.
Any help how I can obtain this trading days ? Thanks in advance
EDIT
FOR FULL DATA
library(gsheet)
data= gsheet2tbl("https://docs.google.com/spreadsheets/d/1PdZDb3OgqSaO6znUWsAh7p_MVLHgNbQM/edit?usp=sharing&ouid=109626011108852110510&rtpof=true&sd=true")
data
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
每个时间窗口的开始日期
如果问题是如何计算每个窗口的开始日期,则使用末尾注释中的数据和窗口 3:
给出:
最大 sd 及其开始和结束日期
最大的 4 个 sd 及其开始和结束日期是:
给出:
两个日期之间的行
如果问题是中间有多少行并包含数据中出现的两个日期,则:
注意
Start date for each time window
If the question is how to calculate the start date for each window then using the data in the Note at the end and a window of 3:
giving:
Largest sd's with their start and end dates
and the largest 4 sd's and their start and end dates are:
giving:
Rows between two dates
If the question is how many rows are between and include two dates that appear in data then:
Note