分配MPI过程来求解矩阵的组件

发布于 2025-02-11 23:12:52 字数 3291 浏览 0 评论 0原文

问题:

我具有一个C代码功能,该功能可以使用Cramer的规则求解任何NXN矩阵。我起草了代码,并正确计算了几个解决方案而没有错误。我想扩大此代码以利用MPI并专用多个过程来求解NXN矩阵的段。

例如,给定的5 x 5 size = 5个进程的矩阵,我希望专用过程0到4来计算第一个5 4 x 4矩阵(使用Cramer的分解方法)。实施这个想法非常棘手,这就是为什么我想在这里发布一个问题(试图自学MPI)。

以下是计算确定确定的代码块:


int determinate_solver(int r, int* ptr, int rank, int size) {

    int ans = 0, a = 0, b = 0, c = 0, d = 0;
    int inner_sol = 0, inner_det = 0;
    
    // define a method to modify the range of computation with respect to available 
    // processes
    int upperLimit = r;
    int start_val = rank * ceil(upperLimit / size) + 1, end_val;

    /* how should start_val and end_val be implemented correctly? */

    if (rank == (size - 1)) {
        end_val = upperLimit;
    }
    else {
        end_val = start_val + ceil(upperLimit / size) - 1;
    }

        // include conditions for when a 1x1 or
        // 2x2 matrix is included.
        if (r == 1 || r == 2) {
            if (r == 1) {
                ans = ptr[0];
            }
            else {
                a = ptr[0];
                b = ptr[1];
                c = ptr[2];
                d = ptr[3];
                ans = (a * d) - (b * c);
            }
        }
        else {
            int i, j, k, l, n = 0, sign = +1, basic, element;

            // define a new pointer array to take into account
            // that a sub-matrix will be created from the larger
            // matrix. This is apart of the recursive sol'n.
            int* q = (int*)calloc(((r - 1) * (r - 1)), sizeof(int));

                      // 0   // r
            for (int i = 0; i < r; i++) {
                l = 0;
                n = 0;
                // note that the value of i is here.
                // basic is the scaler that will be used in the intermediate solution
                basic = *(ptr + i);

                for (int j = 0; j < r; j++) {

                    for (k = 0; k < r; k++) {

                        element = *(ptr + l);
                        if ((j == 0) || (i == k));
                        else
                        {
                            *(q + n) = element;
                            n = n + 1;
                        }
                        l = l + 1;
                    }
                }
                inner_det = determinate_solver (r - 1, q, rank, size); /* define rank and size  */
                inner_sol = sign * basic * inner_det;
                ans = ans + inner_sol;
                sign = sign * -1;
                printf("The final solution is: %d from rank %d \n \n", ans, rank);
            }
        }
        return ans;
}

总体程序为&lt; 150行代码。为了简洁起见,我在这个问题中排除了不必要的代码。

我尝试了

我在end_valstart_val参数中包括此代码块(请参阅注释部分)。我打算使用这些变量来为每个过程构建有效的操作范围。我最初更改了第一个循环,如下所示,

for(int i = start_val; i < end_val; i++){

/* run some code */

}

但事实证明这是不成功的(即,从ANS返回错误的值)。启动和结尾阀可以很好地定义数组值,但是在2维应用程序中挣扎,我还尝试用i start_val替换每个实例,希望这会这样解决问题 - 没错。我相信start_valend_val可以有效地用于解决此问题,但是我不确定如何正确实施此问题。

如何帮助最佳

文章,论文或导致解决方案的链接将被接受为答案。也可以接受不使用Cramer规则的替代解决方案。

Question:

I have a C code function that solves the determinate of any n x n matrix using Cramer's rule. I drafted the code, and computed several solutions correctly without error. I would like to augment this code to utilize MPI and dedicate multiple processes to solve segments of the n x n matrix.

For example, given a 5 x 5 matrix with size = 5 processes, I would like dedicate process 0 through 4 to compute the first 5 4 x 4 matrices (using Cramer's decomposition method). Implementation of this idea has been fairly tricky, which is why I wanted to post a question here (trying to teach myself mpi).

Below is the code block that computes the determinate:


int determinate_solver(int r, int* ptr, int rank, int size) {

    int ans = 0, a = 0, b = 0, c = 0, d = 0;
    int inner_sol = 0, inner_det = 0;
    
    // define a method to modify the range of computation with respect to available 
    // processes
    int upperLimit = r;
    int start_val = rank * ceil(upperLimit / size) + 1, end_val;

    /* how should start_val and end_val be implemented correctly? */

    if (rank == (size - 1)) {
        end_val = upperLimit;
    }
    else {
        end_val = start_val + ceil(upperLimit / size) - 1;
    }

        // include conditions for when a 1x1 or
        // 2x2 matrix is included.
        if (r == 1 || r == 2) {
            if (r == 1) {
                ans = ptr[0];
            }
            else {
                a = ptr[0];
                b = ptr[1];
                c = ptr[2];
                d = ptr[3];
                ans = (a * d) - (b * c);
            }
        }
        else {
            int i, j, k, l, n = 0, sign = +1, basic, element;

            // define a new pointer array to take into account
            // that a sub-matrix will be created from the larger
            // matrix. This is apart of the recursive sol'n.
            int* q = (int*)calloc(((r - 1) * (r - 1)), sizeof(int));

                      // 0   // r
            for (int i = 0; i < r; i++) {
                l = 0;
                n = 0;
                // note that the value of i is here.
                // basic is the scaler that will be used in the intermediate solution
                basic = *(ptr + i);

                for (int j = 0; j < r; j++) {

                    for (k = 0; k < r; k++) {

                        element = *(ptr + l);
                        if ((j == 0) || (i == k));
                        else
                        {
                            *(q + n) = element;
                            n = n + 1;
                        }
                        l = l + 1;
                    }
                }
                inner_det = determinate_solver (r - 1, q, rank, size); /* define rank and size  */
                inner_sol = sign * basic * inner_det;
                ans = ans + inner_sol;
                sign = sign * -1;
                printf("The final solution is: %d from rank %d \n \n", ans, rank);
            }
        }
        return ans;
}

The overall program is < 150 lines of code. For the sake of brevity, I excluded unnecessary code in this question.

What I Have Tried

I included end_val and start_val parameters in this code block (see commented section). I intended to use these variables to build the effective ranges of operation for each process. I initially altered the first for loop as shown below,

for(int i = start_val; i < end_val; i++){

/* run some code */

}

but this proved to be unsuccessful (ie. incorrect values were returned from ans). start and end vals work well for defining array values, but struggle in 2 dimensional applications Moreover, I also tried to replace every instance of i with start_val with the hope that this would fix the problem--no luck. I believe that the start_val and end_val can be utilized effectively to solve this problem, but I am unsure how to properly implement this.

How To Help Best

Articles, papers, or SO links that lead to a solution will be accepted as an answer. Alternate solutions not using Cramer's rule may also be accepted.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

巴黎盛开的樱花 2025-02-18 23:12:52
  1. Cramer的规则具有阶乘复杂性,因此通过计算LU分解并乘以枢轴的计算决定因素可以更好地完成。
  2. 您的代码递归将非连续的一键式复制为连续存储罚款。如果您想使用MPI执行此操作,则可以在一个过程中具有大矩阵,创建子媒体并将其发送所有其他过程。 (发送给自己有点棘手,但并不多。)这是一个很大的瓶颈。
  3. 或者,您可以将大矩阵放在每个过程中,并让每个过程提取其子序列。这在记忆中很浪费。

总而言之,我不会认为这是MPI的好测试问题,但这当然不是不可能的,甚至不是很难。只是不高效。

  1. Cramer's rule has factorial complexity, so computing the determinant is better done through computing an LU factorization and multiplying the pivots.
  2. Your code recursively copies non-contiguous submatrices into contiguous storage fine. If you want to do that with MPI, you can either have the big matrix on one process, create the submatrices and send them all other processes. (Sending to yourself is a little tricky but not much.) That has a big time bottleneck.
  3. Or you can put the big matrix on every process and let each process extract its submatrix. That is wasteful in memory.

In all, I would not consider this to be a good test problem for MPI, but it's certainly not impossible or even very hard. Just not efficient.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文