拉索跑步慢?

发布于 2025-01-22 23:18:04 字数 1232 浏览 6 评论 0 原文

我有一个大型数据集,我一直在尝试运行套索回归。分类变量被重新编码为假人。在收到有关有限内存的几条消息后,我使用矩阵将数据转换为稀疏矩阵。

问题是我的代码已经运行了很长时间(几个小时未完成),我不确定为什么。

这是产生相同问题的2000行数据(约0.3%)的样本:

这是我一直使用的代码:

    library(tidyverse)
    library(Matrix)
    install.packages('glmnet')
    library(glmnet)
    pacman::p_load(methods,utils,foreach,shape,survival,Rcpp,RcppEigen)

    data_sample_matrix = as.matrix(data_sample) %>% Matrix(.,sparse = TRUE)

    set.seed(879)

    split <- sample(nrow(data_sample_matrix), floor(0.8*nrow(data_sample_matrix)))
    
    train <- data_sample_matrix[split,]
    test <- data_sample_matrix[-split,]
    
    train_s <- train[,-28]
    test_s <- test[,-28]
    
    cv_model = cv.glmnet(train_s, train[,28], alpha=1, family = "binomial", nlambda=10, 
                         trace.it = TRUE)

注意:我包括所有应该与Glmnet上传的软件包 per cran ,因为我注意到当我做库时它们没有上传(glmnet)。

注意:[,28]代表我的结果变量。

谁能指出我在做错什么?

I have a large dataset that I've been trying to run a lasso regression on. Categorical variables are re-coded to dummies. After receiving several messages regarding limited memory, I converted my data into a sparse matrix using Matrix.

The issue is that my code has been running for a long time (several hours without completion), and I'm not sure why.

Here is a sample of 2000 rows of data (~0.3% of data) that produces the same issue:
https://drive.google.com/file/d/1ZhyFIoxJSRHrC_eIe58C5zXFKJW-13Lm/view?usp=sharing

This is the code I've been using:

    library(tidyverse)
    library(Matrix)
    install.packages('glmnet')
    library(glmnet)
    pacman::p_load(methods,utils,foreach,shape,survival,Rcpp,RcppEigen)

    data_sample_matrix = as.matrix(data_sample) %>% Matrix(.,sparse = TRUE)

    set.seed(879)

    split <- sample(nrow(data_sample_matrix), floor(0.8*nrow(data_sample_matrix)))
    
    train <- data_sample_matrix[split,]
    test <- data_sample_matrix[-split,]
    
    train_s <- train[,-28]
    test_s <- test[,-28]
    
    cv_model = cv.glmnet(train_s, train[,28], alpha=1, family = "binomial", nlambda=10, 
                         trace.it = TRUE)

Note: I've included all the packages supposed to be uploaded with glmnet per the CRAN because I noticed that they weren't being uploaded when I did library(glmnet).

Note: [,28] represents my outcome variable.

Can anyone point to what I'm doing wrong?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文