我有一个由 1,344 个唯一字符串组成的向量 x。我想生成一个矩阵,为我提供所有可能的三个值组(无论顺序如何),并将其导出到 csv。
我在 64 位 Ubuntu 的 m1.large 实例上的 EC2 上运行 R。使用 comen(x, 3) 时出现内存不足错误:
Error: cannot allocate vector of size 9.0 Gb
结果矩阵的大小为 C1344,3 = 403,716,544 行和三列 - 这是 commn() 函数结果的转置。
我想使用 bigmemory 包创建一个支持 big.matrix 的文件,这样我就可以分配 commn() 函数的结果。我可以创建一个预先分配的大矩阵:
x <- as.character(1:1344)
combos <- 403716544
test <- filebacked.big.matrix(nrow = combos, ncol = 3,
init = 0, backingfile = "test.matrix")
但是当我尝试分配值 test <- comen(x, 3)
我仍然得到相同的结果:Error:无法分配大小为 9.0 的向量Gb
我什至尝试强制 combn(x,3)
的结果,但我认为因为 commn() 函数返回错误,所以 big.matrix 函数也不起作用。
test <- as.big.matrix(matrix(combn(x, 3)), backingfile = "abc")
Error: cannot allocate vector of size 9.0 Gb
Error in as.big.matrix(matrix(combn(x, 3)), backingfile = "abc") :
error in evaluating the argument 'x' in selecting a method for function 'as.big.matrix'
I have a vector x of 1,344 unique strings. I want to generate a matrix that gives me all possible groups of three values, regardless of order, and export that to a csv.
I'm running R on EC2 on a m1.large instance w 64bit Ubuntu. When using combn(x, 3) I get an out of memory error:
Error: cannot allocate vector of size 9.0 Gb
The size of the resulting matrix is C1344,3 = 403,716,544 rows and three columns - which is the transpose of the result of combn() function.
I thought of using the bigmemory package to create a file backed big.matrix so I can then assign the results of the combn() function. I can create a preallocated big matrix:
x <- as.character(1:1344)
combos <- 403716544
test <- filebacked.big.matrix(nrow = combos, ncol = 3,
init = 0, backingfile = "test.matrix")
But when I try to allocate the values test <- combn(x, 3)
I still get the same: Error: cannot allocate vector of size 9.0 Gb
I even tried coercing the result of combn(x,3)
but I think that because the combn() function is returning an error, the big.matrix function doesn't work either.
test <- as.big.matrix(matrix(combn(x, 3)), backingfile = "abc")
Error: cannot allocate vector of size 9.0 Gb
Error in as.big.matrix(matrix(combn(x, 3)), backingfile = "abc") :
error in evaluating the argument 'x' in selecting a method for function 'as.big.matrix'
Is there a way to combine these two functions together to get what I need? Are there any other ways of achieving this? Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

您可以首先找到所有 2 路组合,然后将它们与 3d 值组合,同时每次保存它们。这需要更少的内存:
combn.mod <- function(x,fname){
tmp <- combn(x,2,simplify=F)
n <- length(x)
for ( i in x[-c(n,n-1)]){
# Drop all combinations that contain value i
id <- which(!unlist(lapply(tmp,function(t) i %in% t)))
tmp <- tmp[id]
# add i to all other combinations and write to file
out <- do.call(rbind,lapply(tmp,c,i))
但这并不像约书亚的答案那么普遍,它是专门针对你的情况的。我想它更快——同样,对于这个特殊情况——但我没有进行比较。当应用于您的 x 时,该函数在我的计算机上运行,使用略多于 50 Mb(粗略估计)的空间。
旁注:如果这是出于模拟目的,我发现很难相信任何科学应用程序都需要 400 多万次模拟运行。您可能会在这里询问错误问题的正确答案...
我通过 tt[[i]]<-out
更改了写入行,添加了 tt
- 循环之前的 list()
和循环之后的 return(tt) 。然后:
> do.call(rbind,combn.mod(letters[1:5]))
[,1] [,2] [,3]
[1,] "b" "c" "a"
[2,] "b" "d" "a"
[3,] "b" "e" "a"
[4,] "c" "d" "a"
[5,] "c" "e" "a"
[6,] "d" "e" "a"
[7,] "c" "d" "b"
[8,] "c" "e" "b"
[9,] "d" "e" "b"
[10,] "d" "e" "c"
这是我用 R 编写的一个函数,它目前在 LSPM 中找到其(未导出的)主目录 包。你给它总的项目数
因此您只需循环 1:403716544 并将结果附加到文件中。这可能需要一段时间,但至少是可行的(参见德克的回答)。您可能还需要在多个循环中执行此操作,因为向量
不适合我的机器上的内存。或者您可以将 R 代码移植到 C/C++ 并在那里进行循环/写入,因为它会快很多。
Here's a function I've written in R, which currently finds its (unexported) home in the LSPM package. You give it the total number of items
, the number of items to selectr
, and the index of the combination you wanti
; it returns the values in1:n
corresponding to combinationi
.It allows you to generate each combination based on the value of the lexicographic index:
So you just need to loop over 1:403716544 and append the results to a file. It may take awhile, but it's at least feasible (see Dirk's answer). You also may need to do it in several loops, since the vector
will not fit in memory on my machine.Or you could just port the R code to C/C++ and do the looping / writing there, since it would be a lot faster.