当前位置：文江博客话题详情

如何在不使用 lapply 的情况下将列表列表转换为 R 中的稀疏矩阵？

发布于 2024-10-16 20:19:12 字数 200 浏览 4 评论 0原文

我有一个由 bigsplit() 操作产生的列表（来自包 biganalytics，bigmemory 包的一部分）。

每个列表代表矩阵中的一列，每个列表项是二进制矩阵中值 1 的索引。

将此列表转换为稀疏二进制 (0/1) 矩阵的最佳方法是什么？在 lapply() 中使用 lapply() 是唯一的解决方案吗？如何将列表命名的因素保留为列的名称？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

‘画卷フ 2024-10-23 20:19:12

如果你需要一个矩阵，你可以在没有任何lapply的情况下做到这一点。

假设您有一个如下构建的列表：

Test <- list(
    col1=list(2,4,7),
    col2=list(3,2,6,8),
    col3=list(1,4,5,3,7)
)

首先，您构建一个具有正确维度的零的矩阵。如果您事先了解它们，那就很容易了。否则，您可以轻松推导：

n.cols <- length(Test)
n.ids <- sapply(Test,length)
n.rows <- max(unlist(Test))
out <- matrix(0,nrow=n.rows,ncol=n.cols)

然后，您使用矩阵按列填充的事实来计算必须变为 1 的每个单元格的索引：

id <- unlist(Test)+rep(0:(n.cols-1),n.ids)*n.rows
out[id] <- 1
colnames(out) <- names(Test)

这给出：

> out
     col1 col2 col3
[1,]    0    0    1
[2,]    1    1    0
[3,]    0    1    1
[4,]    1    0    1
[5,]    0    0    1
[6,]    0    1    0
[7,]    1    0    1
[8,]    0    1    0

You can do this without an lapply whatsoever if you need a matrix.

Say you have a list constructed like this :

Test <- list(
    col1=list(2,4,7),
    col2=list(3,2,6,8),
    col3=list(1,4,5,3,7)
)

First you construct a matrix with zeros of the correct dimensions. If you know them beforehand, that's easy. Otherwise you can derive easily:

n.cols <- length(Test)
n.ids <- sapply(Test,length)
n.rows <- max(unlist(Test))
out <- matrix(0,nrow=n.rows,ncol=n.cols)

Then you use the fact that matrices are filled columnwise to calculate the index of each cell that has to become one :

id <- unlist(Test)+rep(0:(n.cols-1),n.ids)*n.rows
out[id] <- 1
colnames(out) <- names(Test)

This gives :

> out
     col1 col2 col3
[1,]    0    0    1
[2,]    1    1    0
[3,]    0    1    1
[4,]    1    0    1
[5,]    0    0    1
[6,]    0    1    0
[7,]    1    0    1
[8,]    0    1    0

回复收藏 0 原文

街角迷惘 2024-10-23 20:19:12

您还可以考虑使用 Matrix 包，它以比基本 R 更有效的方式处理大型稀疏矩阵。您可以通过描述哪些行和列应该为 1 来构建 0 和 1 的稀疏矩阵。

library(Matrix)
Test <- list(
    col1=list(2,4,7),
    col2=list(3,2,6,8),
    col3=list(1,4,5,3,7)
)
n.ids <- sapply(Test,length)
vals <- unlist(Test)
out <- sparseMatrix(vals, rep(seq_along(n.ids), n.ids))

结果是

> out
8 x 3 sparse Matrix of class "ngCMatrix"

[1,] . . |
[2,] | | .
[3,] . | |
[4,] | . |
[5,] . . |
[6,] . | .
[7,] | . |
[8,] . | .

You might also consider using the Matrix package which deals with large sparse matrices in a more efficient way than base R. You can build a sparse matrix of 0s and 1s by describing which rows and columns should be 1s.

library(Matrix)
Test <- list(
    col1=list(2,4,7),
    col2=list(3,2,6,8),
    col3=list(1,4,5,3,7)
)
n.ids <- sapply(Test,length)
vals <- unlist(Test)
out <- sparseMatrix(vals, rep(seq_along(n.ids), n.ids))

The result is

> out
8 x 3 sparse Matrix of class "ngCMatrix"

[1,] . . |
[2,] | | .
[3,] . | |
[4,] | . |
[5,] . . |
[6,] . | .
[7,] | . |
[8,] . | .

回复收藏 0 原文

不羁少年 2024-10-23 20:19:12

使用 Joris 的示例，这里是使用 sapply/replace 的语法简单的方法。我怀疑 Joris 的方法更快，因为它填充了一个预先分配的矩阵，而我的方法隐式涉及 cbind 一堆列，因此需要为这些列重复分配内存（是这样的）真的？）。

Test <- list( 
col1=list(2,4,7), 
col2=list(3,2,6,8), 
col3=list(1,4,5,3,7) 
) 

> z <- rep(0, max(unlist(Test)))
> sapply( Test, function(x) replace(z,unlist(x),1))
     col1 col2 col3
[1,]    0    0    1
[2,]    1    1    0
[3,]    0    1    1
[4,]    1    0    1
[5,]    0    0    1
[6,]    0    1    0
[7,]    1    0    1
[8,]    0    1    0

Using Joris' example, here's a syntactically simple way using sapply/replace. I suspect Joris' approach is faster, because it fills in a pre-allocated matrix, whereas my approach implicitly involves cbinding a bunch of columns, and so would require repeated memory allocations for the columns (is that true?).

Test <- list( 
col1=list(2,4,7), 
col2=list(3,2,6,8), 
col3=list(1,4,5,3,7) 
) 

> z <- rep(0, max(unlist(Test)))
> sapply( Test, function(x) replace(z,unlist(x),1))
     col1 col2 col3
[1,]    0    0    1
[2,]    1    1    0
[3,]    0    1    1
[4,]    1    0    1
[5,]    0    0    1
[6,]    0    1    0
[7,]    1    0    1
[8,]    0    1    0

回复收藏 0 原文

Oo萌小芽oO 2024-10-23 20:19:12

这是一些似乎符合您的描述的示例数据。

a <- as.list(sample(20, 5))
b <- as.list(sample(20, 5))
c <- as.list(sample(20, 5))
abc <- list(a = a, b = b, c = c)

我没有找到使用嵌套 lapply() 执行此操作的方法，但这是另一种方法。消除 unlist() 会很好，但也许其他人可以对此进行改进。

sp_to_bin <- function(splist) {
  binlist <- numeric(100)
  binlist[unlist(splist)] <- 1
  return(binlist)
}
bindf <- data.frame(lapply(abc, sp_to_bin))

Here is some sample data that seems to fit your description.

a <- as.list(sample(20, 5))
b <- as.list(sample(20, 5))
c <- as.list(sample(20, 5))
abc <- list(a = a, b = b, c = c)

I do not see a way to do this with nested lapply() but here is another way. It would be nice to eliminate the unlist(), but maybe someone else can improve on this.

sp_to_bin <- function(splist) {
  binlist <- numeric(100)
  binlist[unlist(splist)] <- 1
  return(binlist)
}
bindf <- data.frame(lapply(abc, sp_to_bin))

回复收藏 0 原文

通知家属抬走 2024-10-23 20:19:12

为了构建 Joris 的答案（使用标量索引向量来填充输出矩阵），您还可以使用矩阵索引向量来填充输出矩阵；有时这可能会在以后写或理解时更清晰一些。

Test <- list(
    col1=list(2,4,7),
    col2=list(3,2,6,8),
    col3=list(1,4,5,3,7)
)

n.cols <- length(Test)
n.ids <- sapply(Test,length)
vals <- unlist(Test)
n.rows <- max(vals)
idx <- cbind(vals, rep(seq_along(n.ids), n.ids))
out <- matrix(0,nrow=n.rows,ncol=n.cols)
out[idx] <- 1
colnames(out) <- names(Test)

结果是一样的。

To build on Joris's answer, which used a scalar index vector to fill in the output matrix, you can also use a matrix index vector to fill in the output matrix; this can sometimes be a little clearer to write or understand later.

Test <- list(
    col1=list(2,4,7),
    col2=list(3,2,6,8),
    col3=list(1,4,5,3,7)
)

n.cols <- length(Test)
n.ids <- sapply(Test,length)
vals <- unlist(Test)
n.rows <- max(vals)
idx <- cbind(vals, rep(seq_along(n.ids), n.ids))
out <- matrix(0,nrow=n.rows,ncol=n.cols)
out[idx] <- 1
colnames(out) <- names(Test)

The result is the same.

回复收藏 0 原文

~没有更多了~