如何在不使用 lapply 的情况下将列表列表转换为 R 中的稀疏矩阵?
我有一个由 bigsplit() 操作产生的列表(来自包 biganalytics,bigmemory 包的一部分)。
每个列表代表矩阵中的一列,每个列表项是二进制矩阵中值 1 的索引。
将此列表转换为稀疏二进制 (0/1) 矩阵的最佳方法是什么? 在 lapply() 中使用 lapply() 是唯一的解决方案吗?如何将列表命名的因素保留为列的名称?
I have a list of lists resulting from a bigsplit() operation (from package biganalytics, part of the bigmemory packages).
Each list represents a column in a matrix, and each list item is an index to a value of 1 in a binary matrix.
What is the best way to turn this list into a sparse binary (0/1) matrix?
Is using lapply() within an lapply() the only solution? How do I keep the factors naming the lists as names for the columns?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
如果你需要一个矩阵,你可以在没有任何lapply的情况下做到这一点。
假设您有一个如下构建的列表:
首先,您构建一个具有正确维度的零的矩阵。如果您事先了解它们,那就很容易了。否则,您可以轻松推导:
然后,您使用矩阵按列填充的事实来计算必须变为 1 的每个单元格的索引:
这给出:
You can do this without an lapply whatsoever if you need a matrix.
Say you have a list constructed like this :
First you construct a matrix with zeros of the correct dimensions. If you know them beforehand, that's easy. Otherwise you can derive easily:
Then you use the fact that matrices are filled columnwise to calculate the index of each cell that has to become one :
This gives :
您还可以考虑使用 Matrix 包,它以比基本 R 更有效的方式处理大型稀疏矩阵。您可以通过描述哪些行和列应该为 1 来构建 0 和 1 的稀疏矩阵。
结果是
You might also consider using the Matrix package which deals with large sparse matrices in a more efficient way than base R. You can build a sparse matrix of 0s and 1s by describing which rows and columns should be 1s.
The result is
使用 Joris 的示例,这里是使用
sapply/replace
的语法简单的方法。我怀疑 Joris 的方法更快,因为它填充了一个预先分配的矩阵,而我的方法隐式涉及 cbind 一堆列,因此需要为这些列重复分配内存(是这样的)真的?)。Using Joris' example, here's a syntactically simple way using
sapply/replace
. I suspect Joris' approach is faster, because it fills in a pre-allocated matrix, whereas my approach implicitly involvescbind
ing a bunch of columns, and so would require repeated memory allocations for the columns (is that true?).这是一些似乎符合您的描述的示例数据。
我没有找到使用嵌套
lapply()
执行此操作的方法,但这是另一种方法。消除unlist()
会很好,但也许其他人可以对此进行改进。Here is some sample data that seems to fit your description.
I do not see a way to do this with nested
lapply()
but here is another way. It would be nice to eliminate theunlist()
, but maybe someone else can improve on this.为了构建 Joris 的答案(使用标量索引向量来填充输出矩阵),您还可以使用矩阵索引向量来填充输出矩阵;有时这可能会在以后写或理解时更清晰一些。
结果是一样的。
To build on Joris's answer, which used a scalar index vector to fill in the output matrix, you can also use a matrix index vector to fill in the output matrix; this can sometimes be a little clearer to write or understand later.
The result is the same.