Scipy稀疏三角矩阵?

发布于 2024-09-06 19:12:13 字数 707 浏览 3 评论 0原文

我正在使用 Scipy 使用 scipy.sparse.lil_matrix 构建一个大型稀疏 (250k X 250k) 共现矩阵。共现矩阵是三角形的;即 M[i,j] == M[j,i]。由于将所有数据存储两次的效率非常低(在我的情况下是不可能的),因此我当前将数据存储在坐标 (i,j) 处,其中 i 始终小于 j。换句话说,我在 (2,3) 处存储了一个值,在 (3,2) 处没有存储任何值,即使我的模型中的 (3,2) 应等于 (2,3)。 (参见下面的矩阵作为示例)

我的问题是我需要能够随机提取与给定索引相对应的数据,但是,至少我目前正在这样做,一半的数据在行中一半在列中,如下所示:

M = 
    [1 2 3 4
     0 5 6 7
     0 0 8 9
     0 0 0 10]

因此,给定上面的矩阵,我希望能够执行像 M[1] 这样的查询,并返回 [2,5, 6,7]。我有两个问题:

1)是否有比先查询行,然后查询列,然后连接两者更有效(最好是内置)的方法?这很糟糕,因为无论我使用 CSC(基于列)还是 CSR(基于行)内部表示,这两个查询之一的效率都非常低。

2)我是否使用了 Scipy 的正确部分?我在 Scipy 库中看到了一些提到三角矩阵的函数,但它们似乎围绕着从完整矩阵获取三角矩阵。就我而言,(我认为)我已经有了一个三角矩阵,并且想要操纵它。

非常感谢。

I am using Scipy to construct a large, sparse (250k X 250k) co-occurrence matrix using scipy.sparse.lil_matrix. Co-occurrence matrices are triangular; that is, M[i,j] == M[j,i]. Since it would be highly inefficient (and in my case, impossible) to store all the data twice, I'm currently storing data at the coordinate (i,j) where i is always smaller than j. So in other words, I have a value stored at (2,3) and no value stored at (3,2), even though (3,2) in my model should be equal to (2,3). (See the matrix below for an example)

My problem is that I need to be able to randomly extract the data corresponding to a given index, but, at least the way, I'm currently doing it, half the data is in the row and half is in the column, like so:

M = 
    [1 2 3 4
     0 5 6 7
     0 0 8 9
     0 0 0 10]

So, given the above matrix, I want to be able to do a query like M[1], and get back [2,5,6,7]. I have two questions:

1) Is there a more efficient (preferably built-in) way to do this than first querying the row, and then the column, and then concatenating the two? This is bad because whether I use CSC (column-based) or CSR (row-based) internal representation, one of the two queries is highly inefficient.

2) Am I even using the right part of Scipy? I have seen a few functions in the Scipy library that mention triangular matrices, but they seem to revolve around getting triangular matrices from a full matrix. In my case, (I think) I already have a triangular matrix, and want to manipulate it.

Many thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

东走西顾 2024-09-13 19:12:13

我想说,鱼和熊掌不可兼得:如果你想要高效的存储,你就不能存储整行(正如你所说);如果您想要高效的行访问,我会说您必须存储完整的行。

虽然实际性能取决于您的应用程序,但您可以检查以下方法是否适合您:

  1. 您使用 Scipy 的稀疏矩阵 用于高效存储。

  2. 你会自动对称化你的矩阵(StackOverflow上有一个小秘诀,适用于至少在常规矩阵上)。

  3. 然后您可以访问其行(或列);这是否有效取决于稀疏矩阵的实现......

I would say that you can't have the cake and eat it too: if you want efficient storage, you cannot store full rows (as you say); if you want efficient row access, I'd say that you have to store full rows.

While real performances depend on your application, you could check whether the following approach works for you:

  1. You use Scipy's sparse matrices for efficient storage.

  2. You automatically symmetrize your matrix (there is a small recipe on StackOverflow, that works at least on regular matrices).

  3. You can then access its rows (or columns); whether this is efficient depends on the implementation of sparse matrices…

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文