R:包装套件和HDM之间可变选择的拉索有什么区别

发布于 2025-02-02 15:26:34 字数 3088 浏览 1 评论 0 原文

对于我的博士学位,我在R中使用套索方法进行可变选择。现在,我使用了glmnet软件包和HDM。 这两个软件包中的逻辑回归的基本拉索估计量有什么区别?我阅读了文档并搜索了很多谷歌,但我发现唯一的提示是 this One 这对我的确切目的不是很有帮助。

询问的原因是因为我的模型是否使用GLMNET是否会收敛,并且当我使用HDM时它们有时不会收敛。这就是为什么我假设差异在优化函数中。这是一个最小示例:

# Delete environment
rm(list = ls())

# Packages
library(glmnet)
#> Loading required package: Matrix
#> Loaded glmnet 4.1-4
library(hdm)

# get data
data = read.table("https://pastebin.com/raw/gmXk0h2P", sep = ",", header = T)

# do the lasso
lasso_hdm = rlassologit(dep ~ ., data = data)
#> Warning: from glmnet C++ code (error code -1); Convergence for 1th lambda value
#> not reached after maxit=100000 iterations; solutions for larger lambdas returned
#> Warning in getcoef(fit, nvars, nx, vnames): an empty model has been returned;
#> probably a convergence issue
lasso_glm = glmnet(as.matrix(data[,!(names(data) %in% c("dep"))]), data$dep, family = "binomial")

此外,请找到我的SessionInfo:

sessionInfo()
#> R version 4.2.0 (2022-04-22)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=en_GB.UTF-8    
#>  [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_GB.UTF-8   
#>  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] rstudioapi_0.13   knitr_1.39        magrittr_2.0.3    R.cache_0.15.0   
#>  [5] rlang_1.0.2       fastmap_1.1.0     fansi_1.0.3       stringr_1.4.0    
#>  [9] styler_1.7.0      highr_0.9         tools_4.2.0       xfun_0.31        
#> [13] R.oo_1.24.0       utf8_1.2.2        cli_3.3.0         withr_2.5.0      
#> [17] htmltools_0.5.2   ellipsis_0.3.2    yaml_2.3.5        digest_0.6.29    
#> [21] tibble_3.1.7      lifecycle_1.0.1   crayon_1.5.1      purrr_0.3.4      
#> [25] R.utils_2.11.0    vctrs_0.4.1       fs_1.5.2          glue_1.6.2       
#> [29] evaluate_0.15     rmarkdown_2.14    reprex_2.0.1      stringi_1.7.6    
#> [33] compiler_4.2.0    pillar_1.7.0      R.methodsS3_1.8.1 pkgconfig_2.0.3

在2022-05-31上由(v2.0.1)

最后,我对这两个软件包的理论都感兴趣,也许我找到了一个充分的理由,可以随着这种收敛而坚持使用Glmnet软件包。

非常感谢您!

For my PhD I use a Lasso approach in R for variable selection. Now, I used the package glmnet and also hdm. What is the difference of the basic lasso estimator for logistic regression in these two packages? I read the docs and also googled a lot but the only hint that I found was this one which was not very helpful for my exact purpose.

The reason for asking is because my models converge if I use glmnet and they sometimes do not converge when I use hdm. That is why I assume that the difference is in the optimization function. Here is a minimal example:

# Delete environment
rm(list = ls())

# Packages
library(glmnet)
#> Loading required package: Matrix
#> Loaded glmnet 4.1-4
library(hdm)

# get data
data = read.table("https://pastebin.com/raw/gmXk0h2P", sep = ",", header = T)

# do the lasso
lasso_hdm = rlassologit(dep ~ ., data = data)
#> Warning: from glmnet C++ code (error code -1); Convergence for 1th lambda value
#> not reached after maxit=100000 iterations; solutions for larger lambdas returned
#> Warning in getcoef(fit, nvars, nx, vnames): an empty model has been returned;
#> probably a convergence issue
lasso_glm = glmnet(as.matrix(data[,!(names(data) %in% c("dep"))]), data$dep, family = "binomial")

Created on 2022-05-31 by the reprex package (v2.0.1)

Additionally, please find my sessionInfo:

sessionInfo()
#> R version 4.2.0 (2022-04-22)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=en_GB.UTF-8    
#>  [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_GB.UTF-8   
#>  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] rstudioapi_0.13   knitr_1.39        magrittr_2.0.3    R.cache_0.15.0   
#>  [5] rlang_1.0.2       fastmap_1.1.0     fansi_1.0.3       stringr_1.4.0    
#>  [9] styler_1.7.0      highr_0.9         tools_4.2.0       xfun_0.31        
#> [13] R.oo_1.24.0       utf8_1.2.2        cli_3.3.0         withr_2.5.0      
#> [17] htmltools_0.5.2   ellipsis_0.3.2    yaml_2.3.5        digest_0.6.29    
#> [21] tibble_3.1.7      lifecycle_1.0.1   crayon_1.5.1      purrr_0.3.4      
#> [25] R.utils_2.11.0    vctrs_0.4.1       fs_1.5.2          glue_1.6.2       
#> [29] evaluate_0.15     rmarkdown_2.14    reprex_2.0.1      stringi_1.7.6    
#> [33] compiler_4.2.0    pillar_1.7.0      R.methodsS3_1.8.1 pkgconfig_2.0.3

Created on 2022-05-31 by the reprex package (v2.0.1)

In the end I am interested in the theory of both packages and maybe I find a good reason to stick to the glmnet package as this converges.

Thank you so much in advance!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文