当前位置：文江博客话题详情

如何在大型稀疏矩阵中找到非零元素的索引？

发布于 2024-12-04 06:55:35 字数 187 浏览 3 评论 0原文

我有两个大小为 100000 X 100000 的方阵（a，b）。我必须对这两个矩阵进行差值（c = ab）。结果矩阵“c”是稀疏矩阵。我想找到所有非零元素的索引。我必须多次执行此操作（>100）。

最简单的方法是使用两个 for 循环。但这是计算密集型的。你能告诉我最好在 R/python/c 中的任何算法或包/库来尽快完成此操作吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

就像说晚安 2024-12-11 06:55:35

由于您有两个密集矩阵，因此双 for 循环是您唯一的选择。您根本不需要稀疏矩阵类，因为您只想知道索引列表 (i,j) 其中 a[i,j] != b[i, j]。

在 R 和 Python 等语言中，双 for 循环的性能会很差。我可能会在双 for 循环的本机代码中编写此代码，并将索引添加到列表对象中。但毫无疑问，解释代码的向导（即 R、Python 等）知道无需诉诸本机编码即可实现此目的的有效方法。

回复收藏 0 原文

无人接听 2024-12-11 06:55:35

在 R 中，如果您使用 Matrix 包，并使用 sparseMatrix 从坐标列表到稀疏矩阵，然后您可以通过以下方式转换回第3列：

TmpX <- as(M, "dgTMatrix")
X3col <- matrix(c(TmpX@i, TmpX@j, TmpX@val), ncol = 3)

这将为您提供稀疏矩阵中的坐标和值。

根据 A 和 B 中非零条目的位置，您可能会发现使用坐标列表比稀疏矩阵表示要好得多（顺便说一下，有数十种稀疏矩阵表示），您可以采用直接利用矢量化运算，而不是依赖稀疏矩阵包来实现最佳性能。我倾向于在不同语言中交替使用 COO 或稀疏矩阵支持，具体取决于我如何获得感兴趣的算法的最快性能。

更新 1：我不知道你的两个矩阵 A 和 B 是稠密的。因此，在 C 中查找非零条目的最简单解决方案非常简单，一开始甚至不进行减法 - 只需比较 A 和 B 的条目。逻辑比较应该比减法更快。首先，找到 A 和 B 的条目，其中 A != B，然后减去这些条目。接下来，您只需将 A 和 B 中索引的矢量化转换为它们的 (row, col) 表示形式。这类似于 Matlab 的 ind2sub 和 sub2ind - 看看这个 R 参考用于计算。

In R, if you use the Matrix package, and sparseMatrix for the conversion from the coordinate list to the sparse matrix, then you can convert back to the 3 column via:

TmpX <- as(M, "dgTMatrix")
X3col <- matrix(c(TmpX@i, TmpX@j, TmpX@val), ncol = 3)

This will give you the coordinates and values in the sparse matrix.

Depending on the locations of non-zero entries in A and B, you may find it much better to work with the coordinate list than the sparse matrix representation (there are, by the way, dozens of sparse matrix representations), as you can take direct advantage of vectorized operations, rather than rely upon your sparse matrix package to perform optimally. I tend to alternate between using the COO or sparse matrix support in different languages, depending on how I will get the fastest performance for the algorithm of interest.

Update 1: I was unaware that your two matrices, A and B, are dense. As such, the easiest solution for finding non-zero entries in C is quite simply to not even subtract at first - just compare the entries of A and B. A logical comparison should be faster than a subtraction. First, find the entries of A and B where A != B, then subtract just those entries. Next, you simply need to convert from the vectorization of indices in A and B to their (row, col) representation. This is similar to ind2sub and sub2ind of Matlab - take a look at this R reference for the calculations.

回复收藏 0 原文

东北女汉子 2024-12-11 06:55:35

您可以使用 c.nonzero()方法：

>>> from scipy.sparse import lil_eye
>>> c = lil_eye((4, 10)) # as an example
>>> c
<4x10 sparse matrix of type '<type 'numpy.float64'>'
        with 4 stored elements in LInked List format>
>>> c.nonzero()
(array([0, 1, 2, 3], dtype=int32), array([0, 1, 2, 3], dtype=int32))
>>> import numpy as np
>>> np.ascontiguousarray(c)
array([  (0, 0) 1.0
  (1, 1)        1.0
  (2, 2)        1.0
  (3, 3)        1.0], dtype=object)

不需要计算c矩阵来找出c = a - b中非零元素的索引；你可以这样做 (a != b).nonzero()：

>>> a = np.random.random_integers(2, size=(4,4))
>>> b = np.random.random_integers(2, size=(4,4))
>>> (a != b).nonzero()
(array([0, 0, 1, 1, 1, 2, 3]), array([1, 2, 1, 2, 3, 2, 0]))
>>> a - b
array([[ 0,  1,  1,  0],
       [ 0,  1, -1, -1],
       [ 0,  0,  1,  0],
       [-1,  0,  0,  0]])

You could use c.nonzero() method:

>>> from scipy.sparse import lil_eye
>>> c = lil_eye((4, 10)) # as an example
>>> c
<4x10 sparse matrix of type '<type 'numpy.float64'>'
        with 4 stored elements in LInked List format>
>>> c.nonzero()
(array([0, 1, 2, 3], dtype=int32), array([0, 1, 2, 3], dtype=int32))
>>> import numpy as np
>>> np.ascontiguousarray(c)
array([  (0, 0) 1.0
  (1, 1)        1.0
  (2, 2)        1.0
  (3, 3)        1.0], dtype=object)

You don't need to calculate c matrix to find out indexes of non-zero elements in c = a - b; you could do (a != b).nonzero():

>>> a = np.random.random_integers(2, size=(4,4))
>>> b = np.random.random_integers(2, size=(4,4))
>>> (a != b).nonzero()
(array([0, 0, 1, 1, 1, 2, 3]), array([1, 2, 1, 2, 3, 2, 0]))
>>> a - b
array([[ 0,  1,  1,  0],
       [ 0,  1, -1, -1],
       [ 0,  0,  1,  0],
       [-1,  0,  0,  0]])

回复收藏 0 原文

春花秋月 2024-12-11 06:55:35

我没有计时，但最简单的代码是

all.indices<- which (C>0, arr.ind=T)

I haven't timed it, but the simplest code is

all.indices<- which (C>0, arr.ind=T)

回复收藏 0 原文

南…巷孤猫 2024-12-11 06:55:35

这段代码花费的时间不到 0.1 秒。

m <- matrix(rpois(1000000,0.01),ncol=1000)
m0 <- lapply(seq(NCOL(m)),function(x) which(m[,x] != 0))

编辑：
对于任何大小的稀疏矩阵（适合内存）。

数据

library(data.table)

N <- 1e+5
n <- 1e+6

ta <- data.table(r=sample(seq(N), n,replace=TRUE),
                 c=sample(seq(N), n,replace=TRUE),
                 a=sample(1:20,n,replace=TRUE))
tb <- data.table(r=sample(seq(N), n,replace=TRUE),
                 c=sample(seq(N), n,replace=TRUE),
                 b=sample(1:20,n,replace=TRUE))
setkey(ta,r,c)
setkey(tb,r,c)

码

system.time(tw <- ta[tb][is.na(a)|is.na(b)|(a-b != 0),list(r=r,c=c)])

This code takes less then 0.1s.

m <- matrix(rpois(1000000,0.01),ncol=1000)
m0 <- lapply(seq(NCOL(m)),function(x) which(m[,x] != 0))

EDIT:
For sparse matrices of any size (which fits memory).

DATA

library(data.table)

N <- 1e+5
n <- 1e+6

ta <- data.table(r=sample(seq(N), n,replace=TRUE),
                 c=sample(seq(N), n,replace=TRUE),
                 a=sample(1:20,n,replace=TRUE))
tb <- data.table(r=sample(seq(N), n,replace=TRUE),
                 c=sample(seq(N), n,replace=TRUE),
                 b=sample(1:20,n,replace=TRUE))
setkey(ta,r,c)
setkey(tb,r,c)

CODE

system.time(tw <- ta[tb][is.na(a)|is.na(b)|(a-b != 0),list(r=r,c=c)])

回复收藏 0 原文

沐歌 2024-12-11 06:55:35

看看 numpy 它有你想要的一切，甚至更多！

请参阅此了解稀疏矩阵支持

回复收藏 0 原文

~没有更多了~

关于作者

那伤。

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

如何在大型稀疏矩阵中找到非零元素的索引？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（6）

关于作者

相关话题

热门标签

推荐作者

離殇

小姐丶请自重

Aik

国产ˉ祖宗

猥琐帝

半仙

友情链接

如何在大型稀疏矩阵中找到非零元素的索引？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（6）

关于作者

相关话题

热门标签

推荐作者

離殇

小姐丶请自重

Aik

国产ˉ祖宗

猥琐帝

半仙

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。