我有一个大型数据集,我一直在尝试运行套索回归。分类变量被重新编码为假人。在收到有关有限内存的几条消息后,我使用矩阵将数据转换为稀疏矩阵。
问题是我的代码已经运行了很长时间(几个小时未完成),我不确定为什么。
这是产生相同问题的2000行数据(约0.3%)的样本:
这是我一直使用的代码:
library(tidyverse)
library(Matrix)
install.packages('glmnet')
library(glmnet)
pacman::p_load(methods,utils,foreach,shape,survival,Rcpp,RcppEigen)
data_sample_matrix = as.matrix(data_sample) %>% Matrix(.,sparse = TRUE)
set.seed(879)
split <- sample(nrow(data_sample_matrix), floor(0.8*nrow(data_sample_matrix)))
train <- data_sample_matrix[split,]
test <- data_sample_matrix[-split,]
train_s <- train[,-28]
test_s <- test[,-28]
cv_model = cv.glmnet(train_s, train[,28], alpha=1, family = "binomial", nlambda=10,
trace.it = TRUE)
注意:我包括所有应该与Glmnet上传的软件包 per cran ,因为我注意到当我做库时它们没有上传(glmnet)。
注意:[,28]代表我的结果变量。
谁能指出我在做错什么?
I have a large dataset that I've been trying to run a lasso regression on. Categorical variables are re-coded to dummies. After receiving several messages regarding limited memory, I converted my data into a sparse matrix using Matrix.
The issue is that my code has been running for a long time (several hours without completion), and I'm not sure why.
Here is a sample of 2000 rows of data (~0.3% of data) that produces the same issue:
https://drive.google.com/file/d/1ZhyFIoxJSRHrC_eIe58C5zXFKJW-13Lm/view?usp=sharing
This is the code I've been using:
library(tidyverse)
library(Matrix)
install.packages('glmnet')
library(glmnet)
pacman::p_load(methods,utils,foreach,shape,survival,Rcpp,RcppEigen)
data_sample_matrix = as.matrix(data_sample) %>% Matrix(.,sparse = TRUE)
set.seed(879)
split <- sample(nrow(data_sample_matrix), floor(0.8*nrow(data_sample_matrix)))
train <- data_sample_matrix[split,]
test <- data_sample_matrix[-split,]
train_s <- train[,-28]
test_s <- test[,-28]
cv_model = cv.glmnet(train_s, train[,28], alpha=1, family = "binomial", nlambda=10,
trace.it = TRUE)
Note: I've included all the packages supposed to be uploaded with glmnet per the CRAN because I noticed that they weren't being uploaded when I did library(glmnet).
Note: [,28] represents my outcome variable.
Can anyone point to what I'm doing wrong?
发布评论