对 x 和 y 变量列表进行循环面板数据回归

发布于 2025-01-10 15:59:10 字数 1689 浏览 1 评论 0原文

我正在使用面板数据，并希望从变量列表中进行一系列双变量固定效应回归。

我的数据集的一小部分看起来像这样：

library(plm)
library(dplyr)

structure(list(v2x_libdem = c(0.876, 0.88, 0.763, 0.779), v2x_partipdem = c(0.679, 
0.68, 0.647, 0.652), v2x_frassoc_thick = c(0.937, 0.937, 0.9, 
0.899), country_name = c("Sweden", "Sweden", "Hungary", "Hungary"
), year = c(2000, 2008, 2000, 2008)), row.names = c(NA, -4L), class = c("tbl_df", 
"tbl", "data.frame"))

# A tibble: 4 × 5
  v2x_libdem v2x_partipdem v2x_frassoc_thick country_name  year
       <dbl>         <dbl>             <dbl> <chr>        <dbl>
1      0.876         0.679             0.937 Sweden        2000
2      0.88          0.68              0.937 Sweden        2008
3      0.763         0.647             0.9   Hungary       2000
4      0.779         0.652             0.899 Hungary       2008

我的变量列表看起来像这样：

vars <- c("v2x_libdem", "v2x_partipdem", "v2x_frassoc_thick")

这些变量应该是 x 和 y，因此 x 和 y 等变量的每个组合都应该处于回归中。

我编写了一个进行面板数据建模的函数：

paneldata_function <- function (y, x) {     #this function will do a PLM 
  plm(paste(y, "~", x), 
      data = Vdem_west,
      model = "within",
      index = c("year", "country_name"))   #for a given x & y variable
}

我需要循环遍历 x 和 y 的每个组合。最好，我希望输出是四个值向量，我可以将它们转换成数据集；一项用于系数，一项用于 x 变量，一项用于 y 变量，一项用于 p 值。

如果我做一个模型，我可以轻松访问这些：

e <- paneldata_function("v2x_libdem", "v2x_partipdem")

p_value <- e[["fstatistic"]][["p.value"]]

x_var <- names(e[["model"]])[2]

y_var <- names(e[["model"]])[1]

我如何编写循环以便将这些向量作为输出？

非常感谢任何帮助，我希望我尽可能清楚地表达我的问题。

原文

I am working with paneldata and wish to conduct a series of bivariate fixed effects regressions from a list of variables.

A small part of my dataset looks like this:

library(plm)
library(dplyr)

structure(list(v2x_libdem = c(0.876, 0.88, 0.763, 0.779), v2x_partipdem = c(0.679, 
0.68, 0.647, 0.652), v2x_frassoc_thick = c(0.937, 0.937, 0.9, 
0.899), country_name = c("Sweden", "Sweden", "Hungary", "Hungary"
), year = c(2000, 2008, 2000, 2008)), row.names = c(NA, -4L), class = c("tbl_df", 
"tbl", "data.frame"))

# A tibble: 4 × 5
  v2x_libdem v2x_partipdem v2x_frassoc_thick country_name  year
       <dbl>         <dbl>             <dbl> <chr>        <dbl>
1      0.876         0.679             0.937 Sweden        2000
2      0.88          0.68              0.937 Sweden        2008
3      0.763         0.647             0.9   Hungary       2000
4      0.779         0.652             0.899 Hungary       2008

My list of variables looks something like this:

vars <- c("v2x_libdem", "v2x_partipdem", "v2x_frassoc_thick")

These variables ought to be both x and y, and thus every combination of the variables as x and y should be in a regression.

I have written a function that conducts the paneldata modelling:

paneldata_function <- function (y, x) {     #this function will do a PLM 
  plm(paste(y, "~", x), 
      data = Vdem_west,
      model = "within",
      index = c("year", "country_name"))   #for a given x & y variable
}

It is this function I need to loop over with each combination of x and y. Preferably, I would want the output to a be four vectors of values which I might turn into a dataset; one for the coefficient, one for the x variable, one for the y variable and one for the p-value.

If I do one single model, I can access these easily:

e <- paneldata_function("v2x_libdem", "v2x_partipdem")

p_value <- e[["fstatistic"]][["p.value"]]

x_var <- names(e[["model"]])[2]

y_var <- names(e[["model"]])[1]

How might I write the loop so that I get these vectors as an output?

Any help is very appreciated, and I hope I phrased my question as clearly as possible.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

英雄似剑 2025-01-17 15:59:10

要循环遍历可能的字段，首先，您需要能够提供该信息的东西。无论您的数据集大小如何，此函数都将起作用，但您必须调整要保留的字段。（例如，这里需要排除最后两个字段，所以我选择1:3。制作这个列表的方法有很多，但很多会重复（或者两个字段相同，但一次是左右下一个是从右到左）。此方法使用RcppAlgos 包。2 表示您想要的变量数量。 code> 仅渲染返回唯一对

ndf = names(Vdem_west)   
cbo = RcppAlgos::comboGeneral(ndf[1:3], 2, F) # make a list of possible combinations
#      [,1]            [,2]               
# [1,] "v2x_libdem"    "v2x_partipdem"    
# [2,] "v2x_libdem"    "v2x_frassoc_thick"
# [3,] "v2x_partipdem" "v2x_frassoc_thick"

。使用您现在的函数，我在 map 调用中调用它，它位于 tidyverse 中

# run all the models; store results in list of lists
res = map(1:nrow(cbo),
          ~paneldata_function(cbo[.x, 1], cbo[.x, 2]))

# name the models
names(res) <- paste0(cbo[, 1], "_", cbo[, 2])
names(res) # check it
# [1] "v2x_libdem_v2x_partipdem"        "v2x_libdem_v2x_frassoc_thick"   
# [3] "v2x_partipdem_v2x_frassoc_thick"

。您没有询问模型名称，但我还添加了

map_dfr(1:length(res),
        .f = function(x){
          estP = summary(res[[x]])$coefficients[c(1, 4)]
          coef = dimnames(summary(res[[x]])$coefficients)[1]
          model = names(res)[x]
          c(Coefficient = coef, Est = round(estP[1], 4), 
            PV = round(estP[2], 4), Model = model)
        })
# # A tibble: 3 × 4
#   Coefficient         Est     PV Model                          
#   <chr>             <dbl>  <dbl> <chr>                          
# 1 v2x_partipdem     3.56  0.0067 v2x_libdem_v2x_partipdem       
# 2 v2x_frassoc_thick 2.85  0.0441 v2x_libdem_v2x_frassoc_thick   
# 3 v2x_frassoc_thick 0.799 0.0509 v2x_partipdem_v2x_frassoc_thick

一个调用，以便您也可以查看模型性能。打电话，但你没有要求这部分（实际上我先做了这部分。并意识到我没有回答您的问题。）

map_dfr(1:length(res),
        .f = function(x){
          R2 = summary(res[[x]])[["r.squared"]][["rsq"]] %>% round(4)
          Adj.R2 = summary(res[[x]])[["r.squared"]][["adjrsq"]] %>% round(4)
          PV = summary(res[[x]])[["fstatistic"]][["p.value"]] %>% round(4)
          Model = names(res)[x]
          c(R2 = R2, Adj.R2 = Adj.R2, PV = PV, Model = Model)
        })
# # A tibble: 3 × 4
#   R2     Adj.R2 PV.F   Model                          
#   <chr>  <chr>  <chr>  <chr>                          
# 1 0.9999 0.9997 0.0067 v2x_libdem_v2x_partipdem       
# 2 0.9952 0.9856 0.0441 v2x_libdem_v2x_frassoc_thick   
# 3 0.9936 0.9809 0.0509 v2x_partipdem_v2x_frassoc_thick

您可以使用相同的方法来测试您的模型并查看假设。

To loop through the possible fields, first, you need something that gives you that information. This function will work regardless of your dataset size, you'll have to adjust what fields to keep, though. (For example, here you needed to exclude the last two fields, so I select 1:3. There are a lot of ways to make this list, but many make duplicates (or the same two fields, but one time is left-right and the next is right-left). This method uses the package RcppAlgos. The 2 is for how many variables you would like. The F or FALSE renders the return unique pairs only.

ndf = names(Vdem_west)   
cbo = RcppAlgos::comboGeneral(ndf[1:3], 2, F) # make a list of possible combinations
#      [,1]            [,2]               
# [1,] "v2x_libdem"    "v2x_partipdem"    
# [2,] "v2x_libdem"    "v2x_frassoc_thick"
# [3,] "v2x_partipdem" "v2x_frassoc_thick"

Then using your function as it is now, I call it in this map call. map is in tidyverse.

# run all the models; store results in list of lists
res = map(1:nrow(cbo),
          ~paneldata_function(cbo[.x, 1], cbo[.x, 2]))

# name the models
names(res) <- paste0(cbo[, 1], "_", cbo[, 2])
names(res) # check it
# [1] "v2x_libdem_v2x_partipdem"        "v2x_libdem_v2x_frassoc_thick"   
# [3] "v2x_partipdem_v2x_frassoc_thick"

For the coefficient data, I wrote the output to a data frame. You didn't ask for the model name, but I added that as well.

map_dfr(1:length(res),
        .f = function(x){
          estP = summary(res[[x]])$coefficients[c(1, 4)]
          coef = dimnames(summary(res[[x]])$coefficients)[1]
          model = names(res)[x]
          c(Coefficient = coef, Est = round(estP[1], 4), 
            PV = round(estP[2], 4), Model = model)
        })
# # A tibble: 3 × 4
#   Coefficient         Est     PV Model                          
#   <chr>             <dbl>  <dbl> <chr>                          
# 1 v2x_partipdem     3.56  0.0067 v2x_libdem_v2x_partipdem       
# 2 v2x_frassoc_thick 2.85  0.0441 v2x_libdem_v2x_frassoc_thick   
# 3 v2x_frassoc_thick 0.799 0.0509 v2x_partipdem_v2x_frassoc_thick

I also put together a call, so that you could look at the model performance, as well. This could be combined with the previous call, but you didn't ask for this part. (I actually did this part first and realized I wasn't answering your question.)

map_dfr(1:length(res),
        .f = function(x){
          R2 = summary(res[[x]])[["r.squared"]][["rsq"]] %>% round(4)
          Adj.R2 = summary(res[[x]])[["r.squared"]][["adjrsq"]] %>% round(4)
          PV = summary(res[[x]])[["fstatistic"]][["p.value"]] %>% round(4)
          Model = names(res)[x]
          c(R2 = R2, Adj.R2 = Adj.R2, PV = PV, Model = Model)
        })
# # A tibble: 3 × 4
#   R2     Adj.R2 PV.F   Model                          
#   <chr>  <chr>  <chr>  <chr>                          
# 1 0.9999 0.9997 0.0067 v2x_libdem_v2x_partipdem       
# 2 0.9952 0.9856 0.0441 v2x_libdem_v2x_frassoc_thick   
# 3 0.9936 0.9809 0.0509 v2x_partipdem_v2x_frassoc_thick

You could use this same approach to test your model and look at the assumptions.

回复收藏 0 原文

~没有更多了~