R 创建历史交易的投资组合回报

发布于 2025-01-20 07:43:19 字数 2337 浏览 3 评论 0原文

我创建这个线程是因为目前在事件研究中没有用于投资组合日历方法的线程。虽然这种方法用于金融领域，但这个问题与使用这种方法的第一步中使用的代码有关；计算一段时间内所有交易的平均回报。

我想在样本期间（本例中：2000 年 1 月 1 日至 2010 年 1 月 1 日）创建一组预先指定交易 (n =~100000) 的投资组合回报（同等加权），同时已经有这些交易每天的实际回报。由于这是一项事件研究，因此每次交易仅考虑前 x 天（本例中为 21 天，t=0 是该期间的开始日，t=20 是交易的最后一天）。

数据的结构如下：

对于每笔交易，ID 都是唯一的，并且对于每笔交易，事件发生后每天的实际回报都是已知的。例如，Ret.t0 是事件发生当天的回报（可在“日期”列中找到，例如 2000-01-01），Ret.t1 是事件发生后 1 天的回报。事件发生了（例如，2000-01-02）。

创建一个可在 r 代码中重现的示例：

size = 1e5

df <- data.frame(
  ID = seq.int(size),
  FirmID = sample(1:1000),
  Date = sample(seq(
    as.Date('2000/01/01'), as.Date('2010/01/01'), by = "day"
  ), size, replace = TRUE),
  Ret.t0 = sample(-2000:2000, size, replace = TRUE)/100000,
  Ret.t1 = sample(-2000:2000, size, replace = TRUE)/100000,
  Ret.t2 = sample(-2000:2000, size, replace = TRUE)/100000,
  Ret.t3 = sample(-2000:2000, size, replace = TRUE)/100000,
  Ret.t4 = sample(-2000:2000, size, replace = TRUE)/100000,
  Ret.t5 = sample(-2000:2000, size, replace = TRUE)/100000,
  Ret.t6 = sample(-2000:2000, size, replace = TRUE)/100000,
  Ret.t7 = sample(-2000:2000, size, replace = TRUE)/100000,
  Ret.t8 = sample(-2000:2000, size, replace = TRUE)/100000,
  Ret.t9 = sample(-2000:2000, size, replace = TRUE)/100000,
  Ret.t10 = sample(-2000:2000, size, replace = TRUE)/100000,
  Ret.t11 = sample(-2000:2000, size, replace = TRUE)/100000,
  Ret.t12 = sample(-2000:2000, size, replace = TRUE)/100000,
  Ret.t13 = sample(-2000:2000, size, replace = TRUE)/100000,
  Ret.t14 = sample(-2000:2000, size, replace = TRUE)/100000,
  Ret.t15 = sample(-2000:2000, size, replace = TRUE)/100000,
  Ret.t16 = sample(-2000:2000, size, replace = TRUE)/100000,
  Ret.t17 = sample(-2000:2000, size, replace = TRUE)/100000,
  Ret.t18 = sample(-2000:2000, size, replace = TRUE)/100000,
  Ret.t19 = sample(-2000:2000, size, replace = TRUE)/100000,
  Ret.t20 = sample(-2000:2000, size, replace = TRUE)/100000)

主要问题是，例如，2000 年 1 月 1 日进行的交易在 t=6 日的回报需要与 2000 年 1 月 7 日进行的交易相匹配，其回报对应于 t=0 天。每笔有回报的交易都需要考虑到平均回报，即投资组合回报。然而，任何给定时间的交易数量也不固定。

投资组合回报输出需要如下所示：

Date       Return
2000-01-01 0.01205
2000-01-02 0.0089
2000-01-03 0.0012
….
2010-01-21 0.0302

原文

I created this thread because there are currently no threads for the portfolio calendar approach in an event study. Although this approach is used in finance, this question relates to the code used in the first step of using this approach; calculating the average return of all trades over the period.

I want to create portfolio returns (equally weighted) of a set of pre-specified trades (n =~100000), in a sample period (in this example: 01-01-2000 to 01-01-2010), while already having the actual return of these trades per day. Being that this is an event study, only the first x days are taken into account per trade (21 days in this example, with t=0 being the starting day of the period and t=20 being the final day of the trade).

The data is structured in such a way:

For every transaction, the ID is unique, and for every transaction the actual return per day following the event is known. For example, Ret.t0 is the return made on the day the event took place (which is found in the “Date” column, e.g. 2000-01-01), Ret.t1 is the return made on 1 day after the day the event took place (e.g., 2000-01-02).

Creating a sample, reproducible in r code:

size = 1e5

df <- data.frame(
  ID = seq.int(size),
  FirmID = sample(1:1000),
  Date = sample(seq(
    as.Date('2000/01/01'), as.Date('2010/01/01'), by = "day"
  ), size, replace = TRUE),
  Ret.t0 = sample(-2000:2000, size, replace = TRUE)/100000,
  Ret.t1 = sample(-2000:2000, size, replace = TRUE)/100000,
  Ret.t2 = sample(-2000:2000, size, replace = TRUE)/100000,
  Ret.t3 = sample(-2000:2000, size, replace = TRUE)/100000,
  Ret.t4 = sample(-2000:2000, size, replace = TRUE)/100000,
  Ret.t5 = sample(-2000:2000, size, replace = TRUE)/100000,
  Ret.t6 = sample(-2000:2000, size, replace = TRUE)/100000,
  Ret.t7 = sample(-2000:2000, size, replace = TRUE)/100000,
  Ret.t8 = sample(-2000:2000, size, replace = TRUE)/100000,
  Ret.t9 = sample(-2000:2000, size, replace = TRUE)/100000,
  Ret.t10 = sample(-2000:2000, size, replace = TRUE)/100000,
  Ret.t11 = sample(-2000:2000, size, replace = TRUE)/100000,
  Ret.t12 = sample(-2000:2000, size, replace = TRUE)/100000,
  Ret.t13 = sample(-2000:2000, size, replace = TRUE)/100000,
  Ret.t14 = sample(-2000:2000, size, replace = TRUE)/100000,
  Ret.t15 = sample(-2000:2000, size, replace = TRUE)/100000,
  Ret.t16 = sample(-2000:2000, size, replace = TRUE)/100000,
  Ret.t17 = sample(-2000:2000, size, replace = TRUE)/100000,
  Ret.t18 = sample(-2000:2000, size, replace = TRUE)/100000,
  Ret.t19 = sample(-2000:2000, size, replace = TRUE)/100000,
  Ret.t20 = sample(-2000:2000, size, replace = TRUE)/100000)

The main problem is that, for example, a trade made on 01-01-2000 has a return on day t=6 that needs to be matched with a trade made on 07-01-2000, which the return corresponds to day t=0. Every trade that has a return, needs to be taken into account in the average return, which is the portfolio return. However, the number of trades at any given time is also not fixed.

The portfolio return output needs to look like this:

Date       Return
2000-01-01 0.01205
2000-01-02 0.0089
2000-01-03 0.0012
….
2010-01-21 0.0302

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

柠檬心 2025-01-27 07:43:19

像这样？

library(dplyr)
library(tidyr)

df %>%
    ## ID not needed:
    select(-ID) %>%
    ## stack return timepoints per FirmID and Date
    pivot_longer(cols = starts_with('Ret'),
                 names_to = 'return_code',
                 values_to = 'return') %>%
    arrange(FirmID, Date) %>%
    rename('start_date' = 'Date') %>%
    ## extract timepoint t = 0, 1, ... from return code Ret.t0 ...
    ## and cast it to integer to add it to start date:
    mutate(t = gsub('.*t', '', return_code) %>% as.integer,
           date = start_date + t) %>%
    ## group by actual date and summarise:
    group_by(date) %>%
    summarise(avg_return = mean(return, na.rm = TRUE))

like this?

library(dplyr)
library(tidyr)

df %>%
    ## ID not needed:
    select(-ID) %>%
    ## stack return timepoints per FirmID and Date
    pivot_longer(cols = starts_with('Ret'),
                 names_to = 'return_code',
                 values_to = 'return') %>%
    arrange(FirmID, Date) %>%
    rename('start_date' = 'Date') %>%
    ## extract timepoint t = 0, 1, ... from return code Ret.t0 ...
    ## and cast it to integer to add it to start date:
    mutate(t = gsub('.*t', '', return_code) %>% as.integer,
           date = start_date + t) %>%
    ## group by actual date and summarise:
    group_by(date) %>%
    summarise(avg_return = mean(return, na.rm = TRUE))

回复收藏 0 原文

~没有更多了~