r中滞后的多次回归

发布于 2025-02-04 18:39:48 字数 1119 浏览 3 评论 0原文

我想对不同变量执行多个滚动回归。我的数据结构如下：

Prices: 
Date Stock1 Stock2 Stock3 Stocki
1990
...
2000 10000  1200   13000  900
2001 90000  1300   12800  920
2002 98000  1280   11900  932
2003 ...
2004 ...
...
2020

第二个数据框架具有我要回归的变量具有相同的结构：

Expenses:
Date Stock1 Stock2 Stock3 Stocki
1990
...
2000 5      12       8      18 
2001 3      13       10     20
2002 2      12       19     32
2003 ...
2004 ...
...
2020

我想以8年的回风为滚动的每个股票价格。例如，计算的代码例如2010年的系数将是：

lm(prices$stock1(2003:2010) ~ expenses$stock1(2002:2009)

此外，我的总体目的是为每种股票5滚动回归效果，从1到5的不同。以下内容：

lag1：

lm(prices$stock1(2003:2010) ~ expenses$stock1(2001:2008)

lag2：

lm(prices$stock1(2003:2010) ~ expenses$stock1(2000:2007)

等等...

最后，我想计算每年每年使用不同滞后的五种股票的五个回归系数的简单平均值。

我期望的输出每年都如下：

date   stock1                                         stock2
2010   (average of coefficients of different lags)    (average of coefficients of different lags)

有人想以有效的方式执行此操作吗？

提前致谢！

原文

I would like to perform multiple rolling regression for different variables. My data is structured as follows:

Prices: 
Date Stock1 Stock2 Stock3 Stocki
1990
...
2000 10000  1200   13000  900
2001 90000  1300   12800  920
2002 98000  1280   11900  932
2003 ...
2004 ...
...
2020

The second dataframe with the variable I want to regress has the same structure:

Expenses:
Date Stock1 Stock2 Stock3 Stocki
1990
...
2000 5      12       8      18 
2001 3      13       10     20
2002 2      12       19     32
2003 ...
2004 ...
...
2020

I want to regress each Stock price on a rolling basis with a backwindow of 8 years. The code to calculate for example the coefficient of year 2010 would be:

lm(prices$stock1(2003:2010) ~ expenses$stock1(2002:2009)

Furthermore, my overall aim is to perform for each stock 5 rolling regression with different lags from 1 to 5. Again the code for the coefficient with lag in year 2010 would look as follows:

lag1:

lm(prices$stock1(2003:2010) ~ expenses$stock1(2001:2008)

lag2:

lm(prices$stock1(2003:2010) ~ expenses$stock1(2000:2007)

and so on...

Lastly, I want to calculate the simple average of the coefficients of the five regression performed for each stock in each year with the different lags.

My desired output would look as follows for each year:

date   stock1                                         stock2
2010   (average of coefficients of different lags)    (average of coefficients of different lags)

Has someone an idea to perform this in an effective way?

Thanks in advance!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

酷遇一生 2025-02-11 18:39:48

您可以将数据框架设置为数据。

library(data.table)

setDT(prices)
setDT(expenses)

dt = melt(prices,id = "Date",variable.name = "stock", value.name = "price")[
  melt(expenses,id="Date", variable.name="stock",value.name="expense"),
  on=.(Date,stock)]

rbindlist(lapply(0:5, \(i) {
    dt[, lm(price~shift(expense,n=i, type="lead"))$coef[2], by=stock]
}))[, .(avg_over_leads = mean(V1)), by=stock]

输出：

    stock avg_over_leads
1: Stock1     0.13232853
2: Stock2     0.03691069
3: Stock3     0.13296958

输入：

set.seed(123)
prices = data.frame(
  Date = 2000:2025,
  Stock1 = rnorm(26),
  Stock2 = rnorm(26),
  Stock3 = rnorm(26)
)
expenses = data.frame(
  Date = 2000:2025,
  Stock1 = rnorm(26),
  Stock2 = rnorm(26),
  Stock3 = rnorm(26)
)

您也可以使用整洁的方法进行相同的操作：

dt = inner_join(
  pivot_longer(prices,-Date,values_to = "price"),
  pivot_longer(expenses,-Date,values_to="expense"),
  by=c("Date","name")
)
bind_rows(
  lapply(0:5, \(i) {
  dt %>% group_by(name) %>% 
    summarize(V1 = lm(price~lead(expense,n=i))$coef[2])
})) %>% 
  group_by(name) %>% 
  summarize(avg_over_leads = mean(V1))

You can set your dataframes to data.table, melt them both into long format, join them together, and run all the regressions (lead=0 through lead=5) using lapply.

library(data.table)

setDT(prices)
setDT(expenses)

dt = melt(prices,id = "Date",variable.name = "stock", value.name = "price")[
  melt(expenses,id="Date", variable.name="stock",value.name="expense"),
  on=.(Date,stock)]

rbindlist(lapply(0:5, \(i) {
    dt[, lm(price~shift(expense,n=i, type="lead"))$coef[2], by=stock]
}))[, .(avg_over_leads = mean(V1)), by=stock]

Output:

    stock avg_over_leads
1: Stock1     0.13232853
2: Stock2     0.03691069
3: Stock3     0.13296958

Input:

set.seed(123)
prices = data.frame(
  Date = 2000:2025,
  Stock1 = rnorm(26),
  Stock2 = rnorm(26),
  Stock3 = rnorm(26)
)
expenses = data.frame(
  Date = 2000:2025,
  Stock1 = rnorm(26),
  Stock2 = rnorm(26),
  Stock3 = rnorm(26)
)

You can also do the same, using tidy approach:

dt = inner_join(
  pivot_longer(prices,-Date,values_to = "price"),
  pivot_longer(expenses,-Date,values_to="expense"),
  by=c("Date","name")
)
bind_rows(
  lapply(0:5, \(i) {
  dt %>% group_by(name) %>% 
    summarize(V1 = lm(price~lead(expense,n=i))$coef[2])
})) %>% 
  group_by(name) %>% 
  summarize(avg_over_leads = mean(V1))

回复收藏 0 原文

~没有更多了~