C++ Ax=b 线性代数系统的内存高效解决方案

发布于 2024-07-30 07:32:58 字数 1270 浏览 6 评论 0原文

我正在使用 Boost UBlas 的数字库绑定来求解简单的线性系统。以下工作正常，除了它仅限于处理相对矩阵 A(mxm) 小“m”。

实际上，我有一个更大的矩阵，尺寸为 m= 10^6（最多 10^7）。
是否存在有效使用内存来求解 Ax=b 的现有 C++ 方法？

#include<boost/numeric/ublas/matrix.hpp>
#include<boost/numeric/ublas/io.hpp>
#include<boost/numeric/bindings/traits/ublas_matrix.hpp>
#include<boost/numeric/bindings/lapack/gesv.hpp>
#include <boost/numeric/bindings/traits/ublas_vector2.hpp>

// compileable with this command


//g++ -I/home/foolb/.boost/include/boost-1_38 -I/home/foolb/.boostnumbind/include/boost-numeric-bindings solve_Axb_byhand.cc -o solve_Axb_byhand -llapack


namespace ublas = boost::numeric::ublas;
namespace lapack= boost::numeric::bindings::lapack;


int main()
{
    ublas::matrix<float,ublas::column_major> A(3,3);
    ublas::vector<float> b(3);


    for(unsigned i=0;i < A.size1();i++)
        for(unsigned j =0;j < A.size2();j++)
        {
            std::cout << "enter element "<<i << j << std::endl;
            std::cin >> A(i,j);
        }

    std::cout << A << std::endl;

    b(0) = 21; b(1) = 1; b(2) = 17;

    lapack::gesv(A,b);

    std::cout << b << std::endl;


    return 0;
}

原文

I am using Numeric Library Bindings for Boost UBlas to solve a simple linear system.
The following works fine, except it is limited to handling matrices A(m x m) for relatively
small 'm'.

In practice I have a much larger matrix with dimension m= 10^6 (up to 10^7).
Is there existing C++ approach for solving Ax=b that uses memory efficiently.

#include<boost/numeric/ublas/matrix.hpp>
#include<boost/numeric/ublas/io.hpp>
#include<boost/numeric/bindings/traits/ublas_matrix.hpp>
#include<boost/numeric/bindings/lapack/gesv.hpp>
#include <boost/numeric/bindings/traits/ublas_vector2.hpp>

// compileable with this command


//g++ -I/home/foolb/.boost/include/boost-1_38 -I/home/foolb/.boostnumbind/include/boost-numeric-bindings solve_Axb_byhand.cc -o solve_Axb_byhand -llapack


namespace ublas = boost::numeric::ublas;
namespace lapack= boost::numeric::bindings::lapack;


int main()
{
    ublas::matrix<float,ublas::column_major> A(3,3);
    ublas::vector<float> b(3);


    for(unsigned i=0;i < A.size1();i++)
        for(unsigned j =0;j < A.size2();j++)
        {
            std::cout << "enter element "<<i << j << std::endl;
            std::cin >> A(i,j);
        }

    std::cout << A << std::endl;

    b(0) = 21; b(1) = 1; b(2) = 17;

    lapack::gesv(A,b);

    std::cout << b << std::endl;


    return 0;
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

兮子 2024-08-06 07:32:58

简短回答：不要使用 Boost 的 LAPACK 绑定，它们是为密集矩阵设计的，
不是稀疏矩阵，请使用 UMFPACK 代替。

长答案：UMFPACK 是当 A 较大且稀疏时求解 Ax=b 的最佳库之一。

下面是生成简单的 A 和 <代码>b
并求解Ax = b。

#include <stdlib.h>
#include <stdio.h>
#include "umfpack.h"

int    *Ap; 
int    *Ai;
double *Ax; 
double *b; 
double *x; 

/* Generates a sparse matrix problem: 
   A is n x n tridiagonal matrix
   A(i,i-1) = -1;
   A(i,i) = 3; 
   A(i,i+1) = -1; 
*/
void generate_sparse_matrix_problem(int n){
  int i;  /* row index */ 
  int nz; /* nonzero index */
  int nnz = 2 + 3*(n-2) + 2; /* number of nonzeros*/
  int *Ti; /* row indices */ 
  int *Tj; /* col indices */ 
  double *Tx; /* values */ 

  /* Allocate memory for triplet form */
  Ti = malloc(sizeof(int)*nnz);
  Tj = malloc(sizeof(int)*nnz);
  Tx = malloc(sizeof(double)*nnz);

  /* Allocate memory for compressed sparse column form */
  Ap = malloc(sizeof(int)*(n+1));
  Ai = malloc(sizeof(int)*nnz);
  Ax = malloc(sizeof(double)*nnz);

  /* Allocate memory for rhs and solution vector */
  x = malloc(sizeof(double)*n);
  b = malloc(sizeof(double)*n);

  /* Construct the matrix A*/
  nz = 0;
  for (i = 0; i < n; i++){
    if (i > 0){
      Ti[nz] = i;
      Tj[nz] = i-1;
      Tx[nz] = -1;
      nz++;
    }

    Ti[nz] = i;
    Tj[nz] = i;
    Tx[nz] = 3;
    nz++;

    if (i < n-1){
      Ti[nz] = i;
      Tj[nz] = i+1;
      Tx[nz] = -1;
      nz++;
    }
    b[i] = 0;
  }
  b[0] = 21; b[1] = 1; b[2] = 17;
  /* Convert Triplet to Compressed Sparse Column format */
  (void) umfpack_di_triplet_to_col(n,n,nnz,Ti,Tj,Tx,Ap,Ai,Ax,NULL);

  /* free triplet format */ 
  free(Ti); free(Tj); free(Tx);
}


int main (void)
{
    double *null = (double *) NULL ;
    int i, n;
    void *Symbolic, *Numeric ;
    n = 500000;
    generate_sparse_matrix_problem(n);
    (void) umfpack_di_symbolic (n, n, Ap, Ai, Ax, &Symbolic, null, null);
    (void) umfpack_di_numeric (Ap, Ai, Ax, Symbolic, &Numeric, null, null);
    umfpack_di_free_symbolic (&Symbolic);
    (void) umfpack_di_solve (UMFPACK_A, Ap, Ai, Ax, x, b, Numeric, null, null);
    umfpack_di_free_numeric (&Numeric);
    for (i = 0 ; i < 10 ; i++) printf ("x [%d] = %g\n", i, x [i]);
    free(b); free(x); free(Ax); free(Ai); free(Ap);
    return (0);
}

函数generate_sparse_matrix_problem创建矩阵A和
右侧b。该矩阵首先以三元组形式构建。这
向量 Ti、Tj 和 Tx 充分描述了 A。三元组形式很容易创建，但是
高效的稀疏矩阵方法需要压缩稀疏列格式。转换
使用umfpack_di_triplet_to_col执行。

使用umfpack_di_symbolic执行符号分解。稀疏的
A 的 LU 分解通过 umfpack_di_numeric 执行。
下三角和上三角求解通过 umfpack_di_solve 执行。

当 n 为 500,000 时，在我的机器上，整个程序运行大约需要一秒钟。
Valgrind 报告分配了 369,239,649 字节（略高于 352 MB）。

请注意此页面讨论了 Boost 对三元组稀疏矩阵（坐标）
和压缩格式。如果您愿意，您可以编写例程来转换这些 boost 对象
简单数组 UMFPACK 需要作为输入。

Short answer: Don't use Boost's LAPACK bindings, these were designed for dense matrices,
not sparse matrices, use UMFPACK instead.

Long answer: UMFPACK is one of the best libraries for solving Ax=b when A is large and sparse.

Below is sample code (based on umfpack_simple.c) that generates a simple A and b
and solves Ax = b.

#include <stdlib.h>
#include <stdio.h>
#include "umfpack.h"

int    *Ap; 
int    *Ai;
double *Ax; 
double *b; 
double *x; 

/* Generates a sparse matrix problem: 
   A is n x n tridiagonal matrix
   A(i,i-1) = -1;
   A(i,i) = 3; 
   A(i,i+1) = -1; 
*/
void generate_sparse_matrix_problem(int n){
  int i;  /* row index */ 
  int nz; /* nonzero index */
  int nnz = 2 + 3*(n-2) + 2; /* number of nonzeros*/
  int *Ti; /* row indices */ 
  int *Tj; /* col indices */ 
  double *Tx; /* values */ 

  /* Allocate memory for triplet form */
  Ti = malloc(sizeof(int)*nnz);
  Tj = malloc(sizeof(int)*nnz);
  Tx = malloc(sizeof(double)*nnz);

  /* Allocate memory for compressed sparse column form */
  Ap = malloc(sizeof(int)*(n+1));
  Ai = malloc(sizeof(int)*nnz);
  Ax = malloc(sizeof(double)*nnz);

  /* Allocate memory for rhs and solution vector */
  x = malloc(sizeof(double)*n);
  b = malloc(sizeof(double)*n);

  /* Construct the matrix A*/
  nz = 0;
  for (i = 0; i < n; i++){
    if (i > 0){
      Ti[nz] = i;
      Tj[nz] = i-1;
      Tx[nz] = -1;
      nz++;
    }

    Ti[nz] = i;
    Tj[nz] = i;
    Tx[nz] = 3;
    nz++;

    if (i < n-1){
      Ti[nz] = i;
      Tj[nz] = i+1;
      Tx[nz] = -1;
      nz++;
    }
    b[i] = 0;
  }
  b[0] = 21; b[1] = 1; b[2] = 17;
  /* Convert Triplet to Compressed Sparse Column format */
  (void) umfpack_di_triplet_to_col(n,n,nnz,Ti,Tj,Tx,Ap,Ai,Ax,NULL);

  /* free triplet format */ 
  free(Ti); free(Tj); free(Tx);
}


int main (void)
{
    double *null = (double *) NULL ;
    int i, n;
    void *Symbolic, *Numeric ;
    n = 500000;
    generate_sparse_matrix_problem(n);
    (void) umfpack_di_symbolic (n, n, Ap, Ai, Ax, &Symbolic, null, null);
    (void) umfpack_di_numeric (Ap, Ai, Ax, Symbolic, &Numeric, null, null);
    umfpack_di_free_symbolic (&Symbolic);
    (void) umfpack_di_solve (UMFPACK_A, Ap, Ai, Ax, x, b, Numeric, null, null);
    umfpack_di_free_numeric (&Numeric);
    for (i = 0 ; i < 10 ; i++) printf ("x [%d] = %g\n", i, x [i]);
    free(b); free(x); free(Ax); free(Ai); free(Ap);
    return (0);
}

The function generate_sparse_matrix_problem creates the matrix A and the
right-hand side b. The matrix is first constructed in triplet form. The
vectors Ti, Tj, and Tx fully describe A. Triplet form is easy to create but
efficient sparse matrix methods require Compressed Sparse Column format. Conversion
is performed with umfpack_di_triplet_to_col.

A symbolic factorization is performed with umfpack_di_symbolic. A sparse
LU decomposition of A is performed with umfpack_di_numeric.
The lower and upper triangular solves are performed with umfpack_di_solve.

With n as 500,000, on my machine, the entire program takes about a second to run.
Valgrind reports that 369,239,649 bytes (just a little over 352 MB) were allocated.

Note this page discusses Boost's support for sparse matrices in Triplet (Coordinate)
and Compressed format. If you like, you can write routines to convert these boost objects
to the simple arrays UMFPACK requires as input.

回复收藏 0 原文

终难遇 2024-08-06 07:32:58

我假设你的矩阵是稠密的。如果它很稀疏，您可以找到许多专门的算法，如 DeusAduro 和达菲莫。

如果您没有（足够大的）集群可供使用，您需要考虑核外算法。 ScaLAPACK 有一些核外求解器作为其原型包，参见文档此处和谷歌了解更多详情。在网络上搜索“out-of-core LU /（矩阵）求解器/包”将为您提供大量其他算法和工具的链接。我不是这些方面的专家。

然而，对于这个问题，大多数人会使用集群。您几乎可以在任何集群上找到的软件包是 ScaLAPACK。此外，典型集群上通常还有许多其他软件包，因此您可以挑选适合您问题的软件包（示例此处和此处）。

在开始编码之前，您可能想快速检查解决问题需要多长时间。典型的求解器大约需要 O(3*N^3) 次触发器（N 是矩阵的维度）。如果 N = 100000，那么您将看到 3000000 Gflops。假设您的内存求解器每个核心的处理速度为 10 Gflops/s，则单个核心的运行时间为 3 1/2 天。由于算法可扩展性良好，增加内核数量应该可以接近线性地减少时间。最重要的是 I/O。

回复收藏 0 原文

月亮邮递员 2024-08-06 07:32:58

假设你的巨大矩阵是稀疏的，我希望它们是那样的大小，看看 PARDISO项目是一个稀疏线性求解器，如果您想处理像您所说的那样大的矩阵，这就是您所需要的。仅允许有效存储非零值，并且比求解相同的稠密矩阵系统要快得多。

回复收藏 0 原文

坚持沉默 2024-08-06 07:32:58

不确定 C++ 实现，但如果内存是一个问题，您可以执行以下操作，具体取决于您正在处理的矩阵类型：

如果您的矩阵是稀疏矩阵或带状矩阵，则可以使用稀疏或带宽求解器。它们不会在带外存储零元素。
您可以使用波前求解器，它将矩阵存储在磁盘上，并且仅引入矩阵波前进行分解。
您可以避免完全求解矩阵并使用迭代方法。
您可以尝试蒙特卡罗解法。

回复收藏 0 原文

叹倦 2024-08-06 07:32:58

查看用于解决线性代数的免费软件列表问题，由 Jack Dongarra 和 Hatem Ltaief 编译。

我认为对于您正在考虑的问题大小，您可能需要迭代算法。如果您不想以稀疏格式存储矩阵 A，则可以使用无矩阵实现。迭代算法通常不需要访问矩阵 A 的各个条目，它们只需要计算矩阵向量乘积 Av （有时 A^T v，即转置矩阵与向量的乘积）。因此，如果该库设计良好，那么向它传递一个知道如何计算矩阵向量乘积的类就足够了。

回复收藏 0 原文