当前位置：文江博客话题详情

在 R 中使用 MNP 包时出现内存泄漏

发布于 2024-12-10 21:34:16 字数 1936 浏览 0 评论 0原文

我有一个关于使用 MNP 包时 R 中内存使用的问题。我的目标是估计多项概率模型，然后使用该模型来预测大量数据的选择。我已将预测数据拆分为多个部分。

问题是，当我循环列表进行预测时，R 使用的内存不断增长，并在达到我的计算机的最大内存后使用交换空间。即使达到这些边界，分配的内存也不会被释放。即使我没有创建任何其他对象，也会发生这种情况，所以我不明白发生了什么。

下面我粘贴了一个遇到所描述问题的示例代码。运行该示例时，内存不断增长，并且即使在删除所有变量并调用 gc() 后仍保持使用状态。

我拥有的真实数据比示例中生成的数据大得多，因此我需要找到解决方法。

我的问题是：

为什么这个脚本使用这么多内存？

如何强制 R 在每一步后释放分配的内存？

library(MNP)

nr <- 10000
draws <- 500
pieces <- 100

# Create artificial training data
trainingData <- data.frame(y = sample(c(1,2,3), nr, rep = T), x1 = sample(1:nr), x2 = sample(1:nr), x3 = sample(1:nr))

# Create artificial predictor data
predictorData <- list()
for(i in 1:pieces){
    predictorData[[i]] <- data.frame(y = NA, x1 = sample(1:nr), x2 = sample(1:nr), x3 = sample(1:nr))
}

# Estimate multinomial probit
mnp.out <- mnp(y ~ x1 + x2, trainingData, n.draws = draws)

# Predict using predictor data
predicted <- list()
for(i in 1:length(predictorData)){
    cat('|')
    mnp.pred <- predict(mnp.out, predictorData[[i]], type = 'prob')$p
    mnp.pred <- colnames(mnp.pred)[apply(mnp.pred, 1, which.max)]
    predicted[[i]] <- mnp.pred
    rm(mnp.pred)
    gc()
}

# Unite output into one string
predicted <- factor(unlist(predicted))

以下是运行脚本后的输出统计信息：

> rm(list = ls())
> gc()
         used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 158950  8.5     407500  21.8   407500  21.8
Vcells 142001  1.1   33026373 252.0 61418067 468.6

这是我的 R 规格：

> sessionInfo()

R version 2.13.1 (2011-07-08)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] MNP_2.6-2   MASS_7.3-14

原文

I have a question concerning memory use in R when using the MNP package. My goal is to estimate a multinomial probit model and then using the model to predict choices on a large set of data. I have split the predictor data in a list of pieces.

The problem is that when I loop over the list to predict, the memory used by R grows constantly and uses swap space after reaching the maximum memory of my computer. The allocated memory is not released even when hitting those boundaries. This happens even though I do not create any additional objects and so I don't understand what is going on.

Below I pasted an example code that suffers from the described problem. When running the example, the memory grows constantly and remains used even after removing all variables and calling gc().

The real data I have is much larger than what is generated in the example, so I need to find a workaround.

My questions are:

Why does this script use so much memory?

How can force R to release the allocated memory after each step?

library(MNP)

nr <- 10000
draws <- 500
pieces <- 100

# Create artificial training data
trainingData <- data.frame(y = sample(c(1,2,3), nr, rep = T), x1 = sample(1:nr), x2 = sample(1:nr), x3 = sample(1:nr))

# Create artificial predictor data
predictorData <- list()
for(i in 1:pieces){
    predictorData[[i]] <- data.frame(y = NA, x1 = sample(1:nr), x2 = sample(1:nr), x3 = sample(1:nr))
}

# Estimate multinomial probit
mnp.out <- mnp(y ~ x1 + x2, trainingData, n.draws = draws)

# Predict using predictor data
predicted <- list()
for(i in 1:length(predictorData)){
    cat('|')
    mnp.pred <- predict(mnp.out, predictorData[[i]], type = 'prob')$p
    mnp.pred <- colnames(mnp.pred)[apply(mnp.pred, 1, which.max)]
    predicted[[i]] <- mnp.pred
    rm(mnp.pred)
    gc()
}

# Unite output into one string
predicted <- factor(unlist(predicted))

Here are the output statistics after running the script:

> rm(list = ls())
> gc()
         used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 158950  8.5     407500  21.8   407500  21.8
Vcells 142001  1.1   33026373 252.0 61418067 468.6

Here are my specifications of R:

> sessionInfo()

R version 2.13.1 (2011-07-08)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] MNP_2.6-2   MASS_7.3-14

分享到QQ

分享到微博