C++ 中的 OpenMP 分段错误

发布于 2025-01-11 03:09:26 字数 1656 浏览 3 评论 0原文

我有一个非常简单的函数，可以计算 N by N 2D 矩阵（由指针 arr 表示）的内部条目数低于 a特定阈值，并更新通过引用传递的计数器 below_threshold：

void count(float *arr, const int N, const float threshold, int &below_threshold) {
    below_threshold = 0;  // make sure it is reset
    bool comparison;
    float temp;
    
    #pragma omp parallel for shared(arr, N, threshold) private(temp, comparison) reduction(+:below_threshold)
    for (int i = 1; i < N-1; i++)  // count only the inner N-2 rows
    {
        for (int j = 1; j < N-1; j++)  // count only the inner N-2 columns
        {
            temp = *(arr + i*N + j);
            comparison = (temp < threshold);
            below_threshold += comparison;
        }
    }
}

当我不使用 OpenMP 时，它运行良好（因此，分配和初始化已正确完成）。

当我使用 N 小于 40000 左右的 OpenMP 时，它运行良好。

然而，一旦我开始在 OpenMP 中使用更大的 N ，它就会不断给我带来分段错误（我目前正在使用 N = 50000 进行测试，并希望最终能够将其恢复）至~100000）。

在软件层面上这有什么问题吗？

PS 分配是动态完成的（ float *arr = new float [N*N] ），这里是用于随机初始化整个矩阵的代码，这对于 OpenMP 来说没有任何问题Large N：

void initialize(float *arr, const int N)
{
    #pragma omp parallel for
    for (int i = 0; i < N; i++)
    {
        for (int j = 0; j < N; j++)
        {
            *(arr + i*N + j) = static_cast <float> (rand()) / static_cast <float> (RAND_MAX);
        }
    }

}

更新：

我尝试将 i、j 和 N 更改为 long long int，它仍然没有修复我的分段错误。如果这是问题所在，为什么它在没有 OpenMP 的情况下仍然可以工作？只有当我添加 #pragma omp ... 时，它才会失败。

原文

I have a very straightforward function that counts how many inner entries of an N by N 2D matrix (represented by a pointer arr) is below a certain threshold, and updates a counter below_threshold that is passed by reference:

void count(float *arr, const int N, const float threshold, int &below_threshold) {
    below_threshold = 0;  // make sure it is reset
    bool comparison;
    float temp;
    
    #pragma omp parallel for shared(arr, N, threshold) private(temp, comparison) reduction(+:below_threshold)
    for (int i = 1; i < N-1; i++)  // count only the inner N-2 rows
    {
        for (int j = 1; j < N-1; j++)  // count only the inner N-2 columns
        {
            temp = *(arr + i*N + j);
            comparison = (temp < threshold);
            below_threshold += comparison;
        }
    }
}

When I do not use OpenMP, it runs fine (thus, the allocation and initialization were done correctly already).

When I use OpenMP with an N that is less than around 40000, it runs fine.

However, once I start using a larger N with OpenMP, it keeps giving me a segmentation fault (I am currently testing with N = 50000 and would like to eventually get it up to ~100000).

Is there something wrong with this at a software level?

P.S. The allocation was done dynamically ( float *arr = new float [N*N] ), and here is the code used to randomly initialize the entire matrix, which didn't have any issues with OpenMP with large N:

void initialize(float *arr, const int N)
{
    #pragma omp parallel for
    for (int i = 0; i < N; i++)
    {
        for (int j = 0; j < N; j++)
        {
            *(arr + i*N + j) = static_cast <float> (rand()) / static_cast <float> (RAND_MAX);
        }
    }

}

UPDATE:

I have tried changing i, j, and N to long long int, and it still has not fixed my segmentation fault. If this was the issue, why has it already worked without OpenMP? It is only once I add #pragma omp ... that it fails.

分享到QQ

分享到微博