OpenMP我有虚假共享还是种族状况?
我正在学习OpenMP,并且正在研究压缩的稀疏行乘法(Datatype std :: Complex< int>
)。每次运行以下函数时,我都会收到不同的执行时间:
typedef std::vector < std::vector < std::complex < int >>> matrix;
struct CSR {
std::vector<std::complex<int>> values; //non-zero values
std::vector<int> row_ptr; //pointers of rows
std::vector<int> cols_index; //indices of columns
int rows; //number of rows
int cols; //number of columns
int NNZ; //number of non_zero elements
};
const matrix multiply_omp (const CSR& A,
const CSR& B) {
if (A.cols != B.rows)
throw "Error";
CSR B_t = sparse_transpose(B);
matrix result(A.rows, std::vector < std::complex < int >>(B.rows, 0));
#pragma omp parallel
{
#pragma omp for
for (int i = 0; i < A.rows; i++) {
for (int j = A.row_ptr[i]; j < A.row_ptr[i + 1]; j++) {
int Ai = A.cols_index[j];
std::complex<int> Avalue = A.values[j];
for (int k = 0; k < B_t.rows; k++) {
std::complex < int > sum(0, 0);
for (int l = B_t.row_ptr[k]; l < B_t.row_ptr[k + 1]; l++)
if (Ai == B_t.cols_index[l]) {
sum += Avalue * B_t.values[l];
break;
}
if (sum != std::complex < int >(0, 0)) {
result[i][k] += sum;
}
}
}
}
}
return result;
}
我设置了一个用于循环的函数10迭代,从而给它提供1000*1000矩阵,并使用过op_get_wtime()
,这是结果:
iteration 1 : 0.751642 s
iteration 2 : 0.911264 s
iteration 3 : 1.553695 s
iteration 4 : 0.761839 s
iteration 5 : 0.603688 s
iteration 6 : 0.423919 s
iteration 7 : 0.423114 s
iteration 8 : 0.445878 s
iteration 9 : 0.892305 s
iteration 10 : 0.918682 s
正常吗?还是我有虚假的共享或比赛状况?
I'm learning Openmp, and I'm working on compressed sparse row multiplication (datatype std::complex<int>
). And I'm getting different execution time each time I run the following function:
typedef std::vector < std::vector < std::complex < int >>> matrix;
struct CSR {
std::vector<std::complex<int>> values; //non-zero values
std::vector<int> row_ptr; //pointers of rows
std::vector<int> cols_index; //indices of columns
int rows; //number of rows
int cols; //number of columns
int NNZ; //number of non_zero elements
};
const matrix multiply_omp (const CSR& A,
const CSR& B) {
if (A.cols != B.rows)
throw "Error";
CSR B_t = sparse_transpose(B);
matrix result(A.rows, std::vector < std::complex < int >>(B.rows, 0));
#pragma omp parallel
{
#pragma omp for
for (int i = 0; i < A.rows; i++) {
for (int j = A.row_ptr[i]; j < A.row_ptr[i + 1]; j++) {
int Ai = A.cols_index[j];
std::complex<int> Avalue = A.values[j];
for (int k = 0; k < B_t.rows; k++) {
std::complex < int > sum(0, 0);
for (int l = B_t.row_ptr[k]; l < B_t.row_ptr[k + 1]; l++)
if (Ai == B_t.cols_index[l]) {
sum += Avalue * B_t.values[l];
break;
}
if (sum != std::complex < int >(0, 0)) {
result[i][k] += sum;
}
}
}
}
}
return result;
}
I set a for loop to call the function 10 iterations giving it 1000*1000 matrices and used omp_get_wtime()
, and here is the result:
iteration 1 : 0.751642 s
iteration 2 : 0.911264 s
iteration 3 : 1.553695 s
iteration 4 : 0.761839 s
iteration 5 : 0.603688 s
iteration 6 : 0.423919 s
iteration 7 : 0.423114 s
iteration 8 : 0.445878 s
iteration 9 : 0.892305 s
iteration 10 : 0.918682 s
is that normal? or do I have false sharing or Race condition?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论