创建(和访问)具有 NA 默认条目的稀疏矩阵
在了解了在 R 中处理稀疏矩阵的选项后,我想要使用 Matrix 包来创建稀疏矩阵以下数据框并使所有其他元素为 NA
。
s r d
1 1089 3772 1
2 1109 190 1
3 1109 2460 1
4 1109 3071 2
5 1109 3618 1
6 1109 38 7
我知道我可以使用以下命令创建一个稀疏矩阵,像往常一样访问元素:
> library(Matrix)
> Y <- sparseMatrix(s,r,x=d)
> Y[1089,3772]
[1] 1
> Y[1,1]
[1] 0
但是如果我想将默认值设置为 NA,我尝试了以下操作:
M <- Matrix(NA,max(s),max(r),sparse=TRUE)
for (i in 1:nrow(X))
M[s[i],r[i]] <- d[i]
并收到此错误
Error in checkSlotAssignment(object, name, value) :
assignment of an object of class "numeric" is not valid for slot "x" in an object of class "lgCMatrix"; is(value, "logical") is not TRUE
不仅如此,我发现需要更长的时间来访问元素。
> system.time(Y[3,3])
user system elapsed
0.000 0.000 0.003
> system.time(M[3,3])
user system elapsed
0.660 0.032 0.995
我应该如何创建这个矩阵? 为什么一个矩阵的处理速度这么慢?
以下是上述数据的代码片段:
X <- structure(list(s = c(1089, 1109, 1109, 1109, 1109, 1109), r = c(3772,
190, 2460, 3071, 3618, 38), d = c(1, 1, 1, 2, 1, 7)), .Names = c("s",
"r", "d"), row.names = c(NA, 6L), class = "data.frame")
After learning about the options for working with sparse matrices in R, I want to use the Matrix package to create a sparse matrix from the following data frame and have all other elements be NA
.
s r d
1 1089 3772 1
2 1109 190 1
3 1109 2460 1
4 1109 3071 2
5 1109 3618 1
6 1109 38 7
I know I can create a sparse matrix with the following, accessing elements as usual:
> library(Matrix)
> Y <- sparseMatrix(s,r,x=d)
> Y[1089,3772]
[1] 1
> Y[1,1]
[1] 0
but if I want to have the default value to be NA, I tried the following:
M <- Matrix(NA,max(s),max(r),sparse=TRUE)
for (i in 1:nrow(X))
M[s[i],r[i]] <- d[i]
and got this error
Error in checkSlotAssignment(object, name, value) :
assignment of an object of class "numeric" is not valid for slot "x" in an object of class "lgCMatrix"; is(value, "logical") is not TRUE
Not only that, I find that one takes much longer to access to elements.
> system.time(Y[3,3])
user system elapsed
0.000 0.000 0.003
> system.time(M[3,3])
user system elapsed
0.660 0.032 0.995
How should I be creating this matrix? Why is one matrix so much slower to work with?
Here's the code snippet for the above data:
X <- structure(list(s = c(1089, 1109, 1109, 1109, 1109, 1109), r = c(3772,
190, 2460, 3071, 3618, 38), d = c(1, 1, 1, 2, 1, 7)), .Names = c("s",
"r", "d"), row.names = c(NA, 6L), class = "data.frame")
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
为什么需要默认 NA 值? 据我所知,只有零单元格的矩阵才是稀疏的。 由于 NA 是非零值,您将失去稀疏矩阵的所有好处。 如果矩阵几乎没有零,则经典矩阵的效率会更高。 经典矩阵就像一个根据维度进行切割的向量。 所以它只需要存储数据向量和维度。 稀疏矩阵仅存储非零值,但也存储位置。 当且仅当您有足够的零值时,这才是一个优势。
Why do you want default NA values? As far as I know matrices are only sparse if they have zero-cells. As NA is a non-zero value, you loose all the benefits from the sparse matrix. A classic matrix is even more efficient if the matrix has hardly any zeros. A classic matrix is like a vector that will be cut according to the dimensions. So it only has to store the data vector and the dimensions. The sparse matrix stores only the non-zero values, but also stores there location. This is an advantage if and only if you have enough zero values.
是的,蒂埃里的答案绝对是正确的,作为“Matrix”包的合著者,我可以说......
对于您的另一个问题:为什么访问“M”比“Y”慢?
主要答案是“M”比“Y”稀疏得多,因此要小得多,而且——取决于所涉及的大小和平台的 RAM——对于较小的对象,尤其是对它们进行索引时,访问时间会更快。
Yes, Thierry's answer is definitely true I can say as co-author of the 'Matrix' package...
To your other question: Why is accessing "M" slower than "Y"?
The main answer is that "M" is much much sparser than "Y" hence much smaller and -- depending on the sizes envolved and the RAM of your platform -- the access time is faster for much smaller objects, notably for indexing into them.