从R中的rollapply获取交易日

发布于 2025-01-09 11:26:49 字数 2207 浏览 0 评论 0原文

我有以下 y 列的模拟数据集，其中包含 2018 年的固定交易日（例如 250）。

data
# A tibble: 249 × 2
   Date                     y
   <dttm>               <dbl>
 1 2018-01-02 00:00:00  0.409
 2 2018-01-03 00:00:00 -1.90 
 3 2018-01-04 00:00:00  0.131
 4 2018-01-05 00:00:00 -0.619
 5 2018-01-08 00:00:00  0.449
 6 2018-01-09 00:00:00  0.448
 7 2018-01-10 00:00:00  0.124
 8 2018-01-11 00:00:00 -0.346
 9 2018-01-12 00:00:00  0.775
10 2018-01-15 00:00:00 -0.948
# … with 239 more rows

使用 tail，

> tail(data,n=10)
# A tibble: 10 × 2
   Date                       y
   <dttm>                 <dbl>
 1 2018-12-13 00:00:00 -0.00736
 2 2018-12-14 00:00:00 -1.30   
 3 2018-12-17 00:00:00  0.227  
 4 2018-12-18 00:00:00 -0.671  
 5 2018-12-19 00:00:00 -0.750  
 6 2018-12-20 00:00:00 -0.906  
 7 2018-12-21 00:00:00 -1.74   
 8 2018-12-27 00:00:00  0.331  
 9 2018-12-28 00:00:00 -0.768  
10 2018-12-31 00:00:00  0.649

我想使用窗口 60 计算 y 列的滚动 sd，然后找到确切的交易日，而不是实际的通常交易日（可以是从指数中完成？我不知道。）

data2 = data%>%
  mutate(date = as.Date(Date))
data3=data2[,-1];head(data3)
roll_win = 60
data3$a = c(rep(NA_real_, roll_win - 1), zoo::rollapply(data3$y, roll_win ,sd))
dat = subset(data3, !is.na(a))
dat_max = dat[dat$a == max(dat$a, na.rm = TRUE), ]
dat_max$date_start = dat_max$date -  (roll_win - 1)
dat_max

事实证明，高波动期是：

dat_max
# A tibble: 1 × 4
      y date           a date_start
  <dbl> <date>     <dbl> <date>    
1 0.931 2018-04-24  1.18 2018-02-24

现在，如果我减去这两个日期，我将得到：

> dat_max$date - dat_max$date_start
Time difference of 59 days

这实际上是正确的，但这些不是交易日。

我在此处提出了类似的问题，但是它并没有解决问题。实际上当时提出的问题是我如何才能获得高波动的日子。

我如何获得这个交易日有什么帮助吗？预先感谢

编辑

完整数据

library(gsheet)
data= gsheet2tbl("https://docs.google.com/spreadsheets/d/1PdZDb3OgqSaO6znUWsAh7p_MVLHgNbQM/edit?usp=sharing&ouid=109626011108852110510&rtpof=true&sd=true")
data

原文

I have following simulated dataset of y column with fixed trading days (say 250) of 2018.

data
# A tibble: 249 × 2
   Date                     y
   <dttm>               <dbl>
 1 2018-01-02 00:00:00  0.409
 2 2018-01-03 00:00:00 -1.90 
 3 2018-01-04 00:00:00  0.131
 4 2018-01-05 00:00:00 -0.619
 5 2018-01-08 00:00:00  0.449
 6 2018-01-09 00:00:00  0.448
 7 2018-01-10 00:00:00  0.124
 8 2018-01-11 00:00:00 -0.346
 9 2018-01-12 00:00:00  0.775
10 2018-01-15 00:00:00 -0.948
# … with 239 more rows

with tail

> tail(data,n=10)
# A tibble: 10 × 2
   Date                       y
   <dttm>                 <dbl>
 1 2018-12-13 00:00:00 -0.00736
 2 2018-12-14 00:00:00 -1.30   
 3 2018-12-17 00:00:00  0.227  
 4 2018-12-18 00:00:00 -0.671  
 5 2018-12-19 00:00:00 -0.750  
 6 2018-12-20 00:00:00 -0.906  
 7 2018-12-21 00:00:00 -1.74   
 8 2018-12-27 00:00:00  0.331  
 9 2018-12-28 00:00:00 -0.768  
10 2018-12-31 00:00:00  0.649

I want to calculate rolling sd of column y with window 60 and then to find the exact trading days not actual-usual days (it can be done from index? I don't know.)

data2 = data%>%
  mutate(date = as.Date(Date))
data3=data2[,-1];head(data3)
roll_win = 60
data3$a = c(rep(NA_real_, roll_win - 1), zoo::rollapply(data3$y, roll_win ,sd))
dat = subset(data3, !is.na(a))
dat_max = dat[dat$a == max(dat$a, na.rm = TRUE), ]
dat_max$date_start = dat_max$date -  (roll_win - 1)
dat_max

Turn outs that the period of high volatility is :

dat_max
# A tibble: 1 × 4
      y date           a date_start
  <dbl> <date>     <dbl> <date>    
1 0.931 2018-04-24  1.18 2018-02-24

Now if I subtract the two dates I will have :

> dat_max$date - dat_max$date_start
Time difference of 59 days

Which is actually true but these are NOT THE TRADING DAYS.

I have asked a similar question here but it didn't solved the problem.Actually the asked question then was how I can obtain the days of high volatility.

Any help how I can obtain this trading days ? Thanks in advance

EDIT

FOR FULL DATA

library(gsheet)
data= gsheet2tbl("https://docs.google.com/spreadsheets/d/1PdZDb3OgqSaO6znUWsAh7p_MVLHgNbQM/edit?usp=sharing&ouid=109626011108852110510&rtpof=true&sd=true")
data

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

草莓酥 2025-01-16 11:26:49

每个时间窗口的开始日期

如果问题是如何计算每个窗口的开始日期，则使用末尾注释中的数据和窗口 3：

w <- 3
out <- mutate(data, 
  sd = zoo::rollapplyr(y, w, sd, fill = NA),
  start = dplyr::lag(Date, w - 1)
)
out

给出：

         Date        y        sd      start
1  2018-12-13 -0.00736        NA       <NA>
2  2018-12-14 -1.30000        NA       <NA>
3  2018-12-17  0.22700 0.8223515 2018-12-13
4  2018-12-18 -0.67100 0.7674388 2018-12-14
5  2018-12-19 -0.75000 0.5427053 2018-12-17
6  2018-12-20 -0.90600 0.1195840 2018-12-18
7  2018-12-21 -1.74000 0.5322894 2018-12-19
8  2018-12-27  0.33100 1.0420146 2018-12-20
9  2018-12-28 -0.76800 1.0361488 2018-12-21
10 2018-12-31  0.64900 0.7435068 2018-12-27

最大 sd 及其开始和结束日期

最大的 4 个 sd 及其开始和结束日期是：

head(dplyr::arrange(out, -sd), 4)

给出：

        Date      y        sd      start
8 2018-12-27  0.331 1.0420146 2018-12-20
9 2018-12-28 -0.768 1.0361488 2018-12-21
3 2018-12-17  0.227 0.8223515 2018-12-13
4 2018-12-18 -0.671 0.7674388 2018-12-14

两个日期之间的行

如果问题是中间有多少行并包含数据中出现的两个日期，则：

 d1 <- as.Date("2018-12-14")
 d2 <- as.Date("2018-12-20")
 diff(match(c(d1, d2), data$Date)) + 1
 ## [1] 5

注意

Lines <- "   Date                       y
 1 2018-12-13T00:00:00 -0.00736
 2 2018-12-14T00:00:00 -1.30   
 3 2018-12-17T00:00:00  0.227  
 4 2018-12-18T00:00:00 -0.671  
 5 2018-12-19T00:00:00 -0.750  
 6 2018-12-20T00:00:00 -0.906  
 7 2018-12-21T00:00:00 -1.74   
 8 2018-12-27T00:00:00  0.331  
 9 2018-12-28T00:00:00 -0.768  
10 2018-12-31T00:00:00  0.649"
data <- read.table(text = Lines)
data$Date <- as.Date(data$Date)

Start date for each time window

If the question is how to calculate the start date for each window then using the data in the Note at the end and a window of 3:

w <- 3
out <- mutate(data, 
  sd = zoo::rollapplyr(y, w, sd, fill = NA),
  start = dplyr::lag(Date, w - 1)
)
out

giving:

         Date        y        sd      start
1  2018-12-13 -0.00736        NA       <NA>
2  2018-12-14 -1.30000        NA       <NA>
3  2018-12-17  0.22700 0.8223515 2018-12-13
4  2018-12-18 -0.67100 0.7674388 2018-12-14
5  2018-12-19 -0.75000 0.5427053 2018-12-17
6  2018-12-20 -0.90600 0.1195840 2018-12-18
7  2018-12-21 -1.74000 0.5322894 2018-12-19
8  2018-12-27  0.33100 1.0420146 2018-12-20
9  2018-12-28 -0.76800 1.0361488 2018-12-21
10 2018-12-31  0.64900 0.7435068 2018-12-27

Largest sd's with their start and end dates

and the largest 4 sd's and their start and end dates are:

head(dplyr::arrange(out, -sd), 4)

giving:

        Date      y        sd      start
8 2018-12-27  0.331 1.0420146 2018-12-20
9 2018-12-28 -0.768 1.0361488 2018-12-21
3 2018-12-17  0.227 0.8223515 2018-12-13
4 2018-12-18 -0.671 0.7674388 2018-12-14

Rows between two dates

If the question is how many rows are between and include two dates that appear in data then:

 d1 <- as.Date("2018-12-14")
 d2 <- as.Date("2018-12-20")
 diff(match(c(d1, d2), data$Date)) + 1
 ## [1] 5

Note

Lines <- "   Date                       y
 1 2018-12-13T00:00:00 -0.00736
 2 2018-12-14T00:00:00 -1.30   
 3 2018-12-17T00:00:00  0.227  
 4 2018-12-18T00:00:00 -0.671  
 5 2018-12-19T00:00:00 -0.750  
 6 2018-12-20T00:00:00 -0.906  
 7 2018-12-21T00:00:00 -1.74   
 8 2018-12-27T00:00:00  0.331  
 9 2018-12-28T00:00:00 -0.768  
10 2018-12-31T00:00:00  0.649"
data <- read.table(text = Lines)
data$Date <- as.Date(data$Date)

回复收藏 0 原文

~没有更多了~