C++ Ax=b 线性代数系统的内存高效解决方案

发布于 2024-07-30 07:32:58 字数 1270 浏览 6 评论 0原文

我正在使用 Boost UBlas 的数字库绑定来求解简单的线性系统。 以下工作正常,除了它仅限于处理相对矩阵 A(mxm) 小“m”。

实际上,我有一个更大的矩阵,尺寸为 m= 10^6(最多 10^7)。
是否存在有效使用内存来求解 Ax=b 的现有 C++ 方法?

#include<boost/numeric/ublas/matrix.hpp>
#include<boost/numeric/ublas/io.hpp>
#include<boost/numeric/bindings/traits/ublas_matrix.hpp>
#include<boost/numeric/bindings/lapack/gesv.hpp>
#include <boost/numeric/bindings/traits/ublas_vector2.hpp>

// compileable with this command


//g++ -I/home/foolb/.boost/include/boost-1_38 -I/home/foolb/.boostnumbind/include/boost-numeric-bindings solve_Axb_byhand.cc -o solve_Axb_byhand -llapack


namespace ublas = boost::numeric::ublas;
namespace lapack= boost::numeric::bindings::lapack;


int main()
{
    ublas::matrix<float,ublas::column_major> A(3,3);
    ublas::vector<float> b(3);


    for(unsigned i=0;i < A.size1();i++)
        for(unsigned j =0;j < A.size2();j++)
        {
            std::cout << "enter element "<<i << j << std::endl;
            std::cin >> A(i,j);
        }

    std::cout << A << std::endl;

    b(0) = 21; b(1) = 1; b(2) = 17;

    lapack::gesv(A,b);

    std::cout << b << std::endl;


    return 0;
}

I am using Numeric Library Bindings for Boost UBlas to solve a simple linear system.
The following works fine, except it is limited to handling matrices A(m x m) for relatively
small 'm'.

In practice I have a much larger matrix with dimension m= 10^6 (up to 10^7).
Is there existing C++ approach for solving Ax=b that uses memory efficiently.

#include<boost/numeric/ublas/matrix.hpp>
#include<boost/numeric/ublas/io.hpp>
#include<boost/numeric/bindings/traits/ublas_matrix.hpp>
#include<boost/numeric/bindings/lapack/gesv.hpp>
#include <boost/numeric/bindings/traits/ublas_vector2.hpp>

// compileable with this command


//g++ -I/home/foolb/.boost/include/boost-1_38 -I/home/foolb/.boostnumbind/include/boost-numeric-bindings solve_Axb_byhand.cc -o solve_Axb_byhand -llapack


namespace ublas = boost::numeric::ublas;
namespace lapack= boost::numeric::bindings::lapack;


int main()
{
    ublas::matrix<float,ublas::column_major> A(3,3);
    ublas::vector<float> b(3);


    for(unsigned i=0;i < A.size1();i++)
        for(unsigned j =0;j < A.size2();j++)
        {
            std::cout << "enter element "<<i << j << std::endl;
            std::cin >> A(i,j);
        }

    std::cout << A << std::endl;

    b(0) = 21; b(1) = 1; b(2) = 17;

    lapack::gesv(A,b);

    std::cout << b << std::endl;


    return 0;
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

兮子 2024-08-06 07:32:58

简短回答:不要使用 Boost 的 LAPACK 绑定,它们是为密集矩阵设计的,
不是稀疏矩阵,请使用 UMFPACK 代替。

长答案:UMFPACK 是当 A 较大且稀疏时求解 Ax=b 的最佳库之一。

下面是生成简单的 A 和 <代码>b
并求解Ax = b

#include <stdlib.h>
#include <stdio.h>
#include "umfpack.h"

int    *Ap; 
int    *Ai;
double *Ax; 
double *b; 
double *x; 

/* Generates a sparse matrix problem: 
   A is n x n tridiagonal matrix
   A(i,i-1) = -1;
   A(i,i) = 3; 
   A(i,i+1) = -1; 
*/
void generate_sparse_matrix_problem(int n){
  int i;  /* row index */ 
  int nz; /* nonzero index */
  int nnz = 2 + 3*(n-2) + 2; /* number of nonzeros*/
  int *Ti; /* row indices */ 
  int *Tj; /* col indices */ 
  double *Tx; /* values */ 

  /* Allocate memory for triplet form */
  Ti = malloc(sizeof(int)*nnz);
  Tj = malloc(sizeof(int)*nnz);
  Tx = malloc(sizeof(double)*nnz);

  /* Allocate memory for compressed sparse column form */
  Ap = malloc(sizeof(int)*(n+1));
  Ai = malloc(sizeof(int)*nnz);
  Ax = malloc(sizeof(double)*nnz);

  /* Allocate memory for rhs and solution vector */
  x = malloc(sizeof(double)*n);
  b = malloc(sizeof(double)*n);

  /* Construct the matrix A*/
  nz = 0;
  for (i = 0; i < n; i++){
    if (i > 0){
      Ti[nz] = i;
      Tj[nz] = i-1;
      Tx[nz] = -1;
      nz++;
    }

    Ti[nz] = i;
    Tj[nz] = i;
    Tx[nz] = 3;
    nz++;

    if (i < n-1){
      Ti[nz] = i;
      Tj[nz] = i+1;
      Tx[nz] = -1;
      nz++;
    }
    b[i] = 0;
  }
  b[0] = 21; b[1] = 1; b[2] = 17;
  /* Convert Triplet to Compressed Sparse Column format */
  (void) umfpack_di_triplet_to_col(n,n,nnz,Ti,Tj,Tx,Ap,Ai,Ax,NULL);

  /* free triplet format */ 
  free(Ti); free(Tj); free(Tx);
}


int main (void)
{
    double *null = (double *) NULL ;
    int i, n;
    void *Symbolic, *Numeric ;
    n = 500000;
    generate_sparse_matrix_problem(n);
    (void) umfpack_di_symbolic (n, n, Ap, Ai, Ax, &Symbolic, null, null);
    (void) umfpack_di_numeric (Ap, Ai, Ax, Symbolic, &Numeric, null, null);
    umfpack_di_free_symbolic (&Symbolic);
    (void) umfpack_di_solve (UMFPACK_A, Ap, Ai, Ax, x, b, Numeric, null, null);
    umfpack_di_free_numeric (&Numeric);
    for (i = 0 ; i < 10 ; i++) printf ("x [%d] = %g\n", i, x [i]);
    free(b); free(x); free(Ax); free(Ai); free(Ap);
    return (0);
}

函数generate_sparse_matrix_problem创建矩阵A
右侧b。 该矩阵首先以三元组形式构建。 这
向量 Ti、Tj 和 Tx 充分描述了 A。三元组形式很容易创建,但是
高效的稀疏矩阵方法需要压缩稀疏列格式。 转换
使用umfpack_di_triplet_to_col执行。

使用umfpack_di_symbolic执行符号分解。 稀疏的
A 的 LU 分解通过 umfpack_di_numeric 执行。
下三角和上三角求解通过 umfpack_di_solve 执行。

当 n 为 500,000 时,在我的机器上,整个程序运行大约需要一秒钟。
Valgrind 报告分配了 369,239,649 字节(略高于 352 MB)。

请注意此页面讨论了 Boost 对三元组稀疏矩阵(坐标)
和压缩格式。 如果您愿意,您可以编写例程来转换这些 boost 对象
简单数组 UMFPACK 需要作为输入。

Short answer: Don't use Boost's LAPACK bindings, these were designed for dense matrices,
not sparse matrices, use UMFPACK instead.

Long answer: UMFPACK is one of the best libraries for solving Ax=b when A is large and sparse.

Below is sample code (based on umfpack_simple.c) that generates a simple A and b
and solves Ax = b.

#include <stdlib.h>
#include <stdio.h>
#include "umfpack.h"

int    *Ap; 
int    *Ai;
double *Ax; 
double *b; 
double *x; 

/* Generates a sparse matrix problem: 
   A is n x n tridiagonal matrix
   A(i,i-1) = -1;
   A(i,i) = 3; 
   A(i,i+1) = -1; 
*/
void generate_sparse_matrix_problem(int n){
  int i;  /* row index */ 
  int nz; /* nonzero index */
  int nnz = 2 + 3*(n-2) + 2; /* number of nonzeros*/
  int *Ti; /* row indices */ 
  int *Tj; /* col indices */ 
  double *Tx; /* values */ 

  /* Allocate memory for triplet form */
  Ti = malloc(sizeof(int)*nnz);
  Tj = malloc(sizeof(int)*nnz);
  Tx = malloc(sizeof(double)*nnz);

  /* Allocate memory for compressed sparse column form */
  Ap = malloc(sizeof(int)*(n+1));
  Ai = malloc(sizeof(int)*nnz);
  Ax = malloc(sizeof(double)*nnz);

  /* Allocate memory for rhs and solution vector */
  x = malloc(sizeof(double)*n);
  b = malloc(sizeof(double)*n);

  /* Construct the matrix A*/
  nz = 0;
  for (i = 0; i < n; i++){
    if (i > 0){
      Ti[nz] = i;
      Tj[nz] = i-1;
      Tx[nz] = -1;
      nz++;
    }

    Ti[nz] = i;
    Tj[nz] = i;
    Tx[nz] = 3;
    nz++;

    if (i < n-1){
      Ti[nz] = i;
      Tj[nz] = i+1;
      Tx[nz] = -1;
      nz++;
    }
    b[i] = 0;
  }
  b[0] = 21; b[1] = 1; b[2] = 17;
  /* Convert Triplet to Compressed Sparse Column format */
  (void) umfpack_di_triplet_to_col(n,n,nnz,Ti,Tj,Tx,Ap,Ai,Ax,NULL);

  /* free triplet format */ 
  free(Ti); free(Tj); free(Tx);
}


int main (void)
{
    double *null = (double *) NULL ;
    int i, n;
    void *Symbolic, *Numeric ;
    n = 500000;
    generate_sparse_matrix_problem(n);
    (void) umfpack_di_symbolic (n, n, Ap, Ai, Ax, &Symbolic, null, null);
    (void) umfpack_di_numeric (Ap, Ai, Ax, Symbolic, &Numeric, null, null);
    umfpack_di_free_symbolic (&Symbolic);
    (void) umfpack_di_solve (UMFPACK_A, Ap, Ai, Ax, x, b, Numeric, null, null);
    umfpack_di_free_numeric (&Numeric);
    for (i = 0 ; i < 10 ; i++) printf ("x [%d] = %g\n", i, x [i]);
    free(b); free(x); free(Ax); free(Ai); free(Ap);
    return (0);
}

The function generate_sparse_matrix_problem creates the matrix A and the
right-hand side b. The matrix is first constructed in triplet form. The
vectors Ti, Tj, and Tx fully describe A. Triplet form is easy to create but
efficient sparse matrix methods require Compressed Sparse Column format. Conversion
is performed with umfpack_di_triplet_to_col.

A symbolic factorization is performed with umfpack_di_symbolic. A sparse
LU decomposition of A is performed with umfpack_di_numeric.
The lower and upper triangular solves are performed with umfpack_di_solve.

With n as 500,000, on my machine, the entire program takes about a second to run.
Valgrind reports that 369,239,649 bytes (just a little over 352 MB) were allocated.

Note this page discusses Boost's support for sparse matrices in Triplet (Coordinate)
and Compressed format. If you like, you can write routines to convert these boost objects
to the simple arrays UMFPACK requires as input.

终难遇 2024-08-06 07:32:58

我假设你的矩阵是稠密的。 如果它很稀疏,您可以找到许多专门的算法,如 DeusAduro达菲莫

如果您没有(足够大的)集群可供使用,您需要考虑核外算法。 ScaLAPACK 有一些核外求解器作为其原型包,参见文档此处谷歌了解更多详情。 在网络上搜索“out-of-core LU /(矩阵)求解器/包”将为您提供大量其他算法和工具的链接。 我不是这些方面的专家。

然而,对于这个问题,大多数人会使用集群。 您几乎可以在任何集群上找到的软件包是 ScaLAPACK。 此外,典型集群上通常还有许多其他软件包,因此您可以挑选适合您问题的软件包(示例 此处此处 )。

在开始编码之前,您可能想快速检查解决问题需要多长时间。 典型的求解器大约需要 O(3*N^3) 次触发器(N 是矩阵的维度)。 如果 N = 100000,那么您将看到 3000000 Gflops。 假设您的内存求解器每个核心的处理速度为 10 Gflops/s,则单个核心的运行时间为 3 1/2 天。 由于算法可扩展性良好,增加内核数量应该可以接近线性地减少时间。 最重要的是 I/O。

I assume that your matrix is dense. If it is sparse, you can find numerous specialised algorithms as already mentioned by DeusAduro and duffymo.

If you don't have a (large enough) cluster at your disposal, you want to look at out-of-core algorithms. ScaLAPACK has a few out-of-core solvers as part of its prototype package, see the documentation here and Google for more details. Searching the web for "out-of-core LU / (matrix) solvers / packages" will give you links to a wealth of further algorithms and tools. I am not an expert on those.

For this problem, most people would use a cluster, however. The package you will find on almost any cluster is ScaLAPACK, again. In addition, there are usually numerous other packages on the typical cluster, so you can pick and choose what suits your problem (examples here and here).

Before you start coding, you probably want to quickly check how long it will take to solve your problem. A typical solver takes about O(3*N^3) flops (N is dimension of matrix). If N = 100000, you are hence looking at 3000000 Gflops. Assuming that your in-memory solver does 10 Gflops/s per core, you are looking at 3 1/2 days on a single core. As the algorithms scale well, increasing the number of cores should reduce the time close to linearly. On top of that comes the I/O.

月亮邮递员 2024-08-06 07:32:58

假设你的巨大矩阵是稀疏的,我希望它们是那样的大小,看看 PARDISO项目是一个稀疏线性求解器,如果您想处理像您所说的那样大的矩阵,这就是您所需要的。 仅允许有效存储非零值,并且比求解相同的稠密矩阵系统要快得多。

Assuming your huge matrices are sparse, which I hope they are at that size, have a look at the PARDISO project which is a sparse linear solver, this is what you'll need if you want to handle matrices as big as you said. Allows efficient storage of only non-zero values, and is much faster than solving the same system of dense matrices.

坚持沉默 2024-08-06 07:32:58

不确定 C++ 实现,但如果内存是一个问题,您可以执行以下操作,具体取决于您正在处理的矩阵类型:

  1. 如果您的矩阵是稀疏矩阵或带状矩阵,则可以使用稀疏或带宽求解器。 它们不会在带外存储零元素。
  2. 您可以使用波前求解器,它将矩阵存储在磁盘上,并且仅引入矩阵波前进行分解。
  3. 您可以避免完全求解矩阵并使用迭代方法。
  4. 您可以尝试蒙特卡罗解法。

Not sure about C++ implementations, but there are several things you can do if memory is an issue depending on the type of matrix you're dealing with:

  1. If your matrix is sparse or banded, you can use a sparse or bandwidth solver. These don't store zero elements outside the band.
  2. You can use a wavefront solver, which stores the matrix on disk and only brings in the matrix wavefront for decomposition.
  3. You can avoid solving the matrix altogether and use iterative methods.
  4. You can try Monte Carlo methods of solution.
叹倦 2024-08-06 07:32:58

查看用于解决线性代数的免费软件列表问题,由 Jack Dongarra 和 Hatem Ltaief 编译。

我认为对于您正在考虑的问题大小,您可能需要迭代算法。 如果您不想以稀疏格式存储矩阵 A,则可以使用无矩阵实现。 迭代算法通常不需要访问矩阵 A 的各个条目,它们只需要计算矩阵向量乘积 Av (有时 A^T v,即转置矩阵与向量的乘积)。 因此,如果该库设计良好,那么向它传递一个知道如何计算矩阵向量乘积的类就足够了。

Have a look at the list of freely available software for the solution of linear algebra problems, compiled by Jack Dongarra and Hatem Ltaief.

I think that for the problem size you're looking at, you probably need an iterative algorithm. If you don't want to store the matrix A in a sparse format, you can use a matrix-free implementation. Iterative algorithms typically do not need to access individual entries of the matrix A, they only need to compute matrix-vector products Av (and sometimes A^T v, the product of the transposed matrix with the vector). So if the library is well-designed, it should be enough if you pass it a class that knows how to do matrix-vector products.

甜宝宝 2024-08-06 07:32:58

正如已接受的答案所示,有 UMFPACK。 但如果您使用 BOOST,您仍然可以使用 BOOST 中的紧凑矩阵并使用 UMFPACK 来求解系统。 有一个绑定可以轻松实现:

http://mathema.tician.de/software /boost-numeric-bindings

它的日期大约有两年,但它只是一个绑定(以及其他一些绑定)。

请参阅相关问题:
UMFPACK 和 BOOST 的 uBLAS 稀疏矩阵

As the accepted answer suggests there is UMFPACK. But if you are using BOOST you can still use the compact matrices in BOOST and use UMFPACK to solve the system. There is a binding which makes it easy:

http://mathema.tician.de/software/boost-numeric-bindings

Its about two years dated, but its just a binding (along with a few others).

see related question:
UMFPACK and BOOST's uBLAS Sparse Matrix

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文