C++ 中的高效矩阵分解为方形子矩阵

发布于 2024-10-17 17:56:59 字数 322 浏览 5 评论 0原文

我通过使用一维数据类型并将其包装成行和列,在 C++ 中实现了矩阵数据类型。现在,我希望能够从此时开始创建方形/分块子矩阵,并且我想在内存中进行操作。

问题是我希望其中一些子矩阵可以转移到 GPU 内存并可以在那里并行处理它们。例如,这对于矩阵乘法很有用。由于这些子矩阵在主内存中未对齐,因此如果不创建单独的副本,将它们作为单个单元复制到设备内存似乎是不可能的?我希望将这种直接 GPU 子矩阵复制映射到 CPU 原始矩阵,以达到更新和提高效率的目的。我事先不知道确切的分区。

有人知道我怎样才能实现它吗?

提醒一下,矩阵需要按块划分,而不是按行划分,这在 C/C++ 中相对容易。

I have implemented a Matrix datatype in C++ by using 1D datatype and wrapping it into rows and columns. Now, I want to have this possibility to create square/blocked sub-matrices from this time and I want to do it in-memory.

The problem is that I want some of these sub-matrices to be transferable to GPU memory and can process them there in parallel. This is for example, useful for Matrix Multiplication. As these submatrices are not aligned in main-memory, copying them to device memory as a single unit looks impossible without creating separate copy? I want to have this direct GPU sub-matrix copy mapping to CPU-original matrix for updation and efficiency purpose. I don't know about exact partitioning in advance.

Do someone has some idea how can I achieve it possibly?

Just a reminder, matrix needs to be partitioned in blocks and not row-wise which will be relatively easy in C/C++.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

迷鸟归林 2024-10-24 17:56:59

如果在创建“主”矩阵时已知所需的子矩阵,并且它们形成主矩阵的一个分区,则可以创建一个类似于这样的复合矩阵类:

// supposing an IMatrix<T> interface (pure virtual members only) class

template< typename T >
struct CompositeMatrix : public IMatrix<T> {
   typedef std::vector<PlainMatrix<T>*> tMatrices;

   tMatrices submatrices;
   T& element( size_t row, size_t column ) {
       return findsubmatrix( row, column )->element( row, column );
   }

   // find algorithm implementing 'chain of responsibility-like' pattern.
   PlainMatrix<T>* findsubmatrix( size_t row, size_t col ) {
     for( tMatrices::iterator it = submatrices.begin()
        ; it != submatrices.end()
        ; ++it)
     {
        if( it->contains( row,col ) ) return *it;            
     }
     return NULL;
   }
};

“PlainMatix”可以组织为内存高效的方式。

If the required sub-matrices are known at the time the 'master' matrix is created, and if they form a partition of the master, it's possible to create a composite matrix class somewhat like this:

// supposing an IMatrix<T> interface (pure virtual members only) class

template< typename T >
struct CompositeMatrix : public IMatrix<T> {
   typedef std::vector<PlainMatrix<T>*> tMatrices;

   tMatrices submatrices;
   T& element( size_t row, size_t column ) {
       return findsubmatrix( row, column )->element( row, column );
   }

   // find algorithm implementing 'chain of responsibility-like' pattern.
   PlainMatrix<T>* findsubmatrix( size_t row, size_t col ) {
     for( tMatrices::iterator it = submatrices.begin()
        ; it != submatrices.end()
        ; ++it)
     {
        if( it->contains( row,col ) ) return *it;            
     }
     return NULL;
   }
};

The 'PlainMatix' can be organized in a memory-efficient way.

过气美图社 2024-10-24 17:56:59

If your matrices' dimensions are powers of 2, you can store them in host memory in z-order. This way, you just need the start- and end-index of a submatrix to copy it with one call to cudaMemcpy.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文