循环时有效地平行沉重任务吗?
开发了粒子模拟代码的串行版本,现在我想仅在时间步进期间最繁重的任务上加快一点速度。基本上在一个时间步执行 3 个不同的任务(A、B、C):
A: 1) update particles contained in a sub-domain (cell)
2) then update particle's neighbors (particles)
B: 1) update potential contact pairs between particles (close enough)
2) loop surface points (10-20k per particle) of each contact pair: find contact point
C: Integration: update each particle's position, velocity, etc.
最重的任务是 B.2:通常高达 50~70% CPU 时间。
所以我的第一个想法是并行化 B.2
并让其余的进行串行计算。
...
int N_every_neighbors = 1000;
int N_every_nodes = 100;
while (time())
{
// update neighbors
if (curr_steps % N_every_neighbors == 0)
{
A.update_cell_sub_rigids(); // light task
A.update_neighbor_list(); // light task
B.update_contact_pairs(); // moderate task
B.update_node_neighbors(check_all); // heaviest task!
}
if (curr_steps % N_every_nodes == 0)
{
B.update_node_neighbors(not_check_all); // second heaviest
}
// update particle position, contact forces
C.integration.initial_integrate(); // light task
C.integration.update_contact_forces(); // moderate task
C.integration.final_integrate(); // light task
}
...
问题是任务 A、B、C 必须顺序执行才能得到正确的结果,即它们不是独立的任务。
<代码>A.1 ---> A.2 ===> B.1---> B.2 ===> C.1 ---> C.2 ---> C.3
所以我首先要做的是让繁重的任务 B.2 B.update_node_neighbors()
并行运行,因为这个函数中有嵌套循环。
由于我对 OpenMP 还很陌生,所以只是做了一些简单的优化。
int N_threads = 8;
omp_set_num_threads(N);
#pragma omp parallel
#pragma omp single
while (time())
{
// do tasks A ---> B ---> C;
}
void B::update_node_neighbors (bool check_all)
{
int All_contact_pairs = this->contact_pairs.size();
#pragma omp for
for (int i=0; i<All_contact_pairs; i++)
{
auto& particle_i_contacts = this->contact_pairs[i];
int N_contacts_i = particle_i_contacts.size();
// loop over all contacts for particel i
for (int j=0; j<N_contacts_i; j++)
{
auto& pair_ij = particle_i_contacts[j];
// really heavy computation here
...
}
}
}
通过这样做,我发现性能没有显着提高。请问有并行计算经验的人,有没有更好的方法让函数B.2
在每个时间步并行运行,而让其余任务以串行方式运行。
更新1:
仅对繁重的任务B.2
做了一些简单的测试
while (time())
{
if (condition_0)
{
A.1;
A.2
B.1;
B.2(true); // heavy task!
}
if (condition_1)
{
B.2(false); // second heaviest
}
C.1;
C.2;
C.3;
}
B.2
的实际内容如下:
void B::update_node_neighbors(bool check_all)
{
...
int N_threads = 6;
omp_set_num_threads(N_threads);
#pragma omp parallel for schedule(static)
for (int i=0; i<N_contacts; i++)
{
...
// particle-particle contacts
for (int j=0; j<N_contacts_pp; j++)
{
for(int pt_id ...)
{
// check all particle_i's surface points to particle_j
// do_the_actual_work
}
}
// particle-wall contacts
for (int k=0; k<N_contacts_pw; k++)
{
for(int pt_id ...)
{
// check all particle_i's surface points to wall_k
// do_the_actual_work
}
}
}
尝试N_threads = 1 ,2,4,6,8,10,12;对于恒定的时间步长,CPU 时间或多或少是相同的。为什么 B.2
中最外循环的 OpenMP 并行不起作用?无法弄清楚:(
Developed a serial version of particle simulation code, now I want to speed up a bit Only on the heaviest task during time-stepping. Basically 3 different tasks (A, B, C) performed during one time-step:
A: 1) update particles contained in a sub-domain (cell)
2) then update particle's neighbors (particles)
B: 1) update potential contact pairs between particles (close enough)
2) loop surface points (10-20k per particle) of each contact pair: find contact point
C: Integration: update each particle's position, velocity, etc.
The heaviest task is B.2
: normally up to 50~70% CPU time.
So my first idea is to parallelize B.2
and let the rest do serial computation.
...
int N_every_neighbors = 1000;
int N_every_nodes = 100;
while (time())
{
// update neighbors
if (curr_steps % N_every_neighbors == 0)
{
A.update_cell_sub_rigids(); // light task
A.update_neighbor_list(); // light task
B.update_contact_pairs(); // moderate task
B.update_node_neighbors(check_all); // heaviest task!
}
if (curr_steps % N_every_nodes == 0)
{
B.update_node_neighbors(not_check_all); // second heaviest
}
// update particle position, contact forces
C.integration.initial_integrate(); // light task
C.integration.update_contact_forces(); // moderate task
C.integration.final_integrate(); // light task
}
...
The problem is that tasks A, B, C have to be executed sequentially for correct result, i.e. they are NOT independent tasks.
A.1 ---> A.2 ===> B.1 ---> B.2 ===> C.1 ---> C.2 ---> C.3
So what I want to do first is to make the heavy task B.2 B.update_node_neighbors()
run in parallel, as there are nested loops in this function.
As I am quite new to OpenMP, so just did some simple optimization.
int N_threads = 8;
omp_set_num_threads(N);
#pragma omp parallel
#pragma omp single
while (time())
{
// do tasks A ---> B ---> C;
}
void B::update_node_neighbors (bool check_all)
{
int All_contact_pairs = this->contact_pairs.size();
#pragma omp for
for (int i=0; i<All_contact_pairs; i++)
{
auto& particle_i_contacts = this->contact_pairs[i];
int N_contacts_i = particle_i_contacts.size();
// loop over all contacts for particel i
for (int j=0; j<N_contacts_i; j++)
{
auto& pair_ij = particle_i_contacts[j];
// really heavy computation here
...
}
}
}
By doing this, I found no significant performance increase. I would like to ask those who are experienced on parallel computation, is there any better way to make the function B.2
run in parallel at each time-step, and let the rest tasks run in serial fashion.
Update 1:
Did some simple test only on the heavy task B.2
while (time())
{
if (condition_0)
{
A.1;
A.2
B.1;
B.2(true); // heavy task!
}
if (condition_1)
{
B.2(false); // second heaviest
}
C.1;
C.2;
C.3;
}
The actual content of B.2
is like:
void B::update_node_neighbors(bool check_all)
{
...
int N_threads = 6;
omp_set_num_threads(N_threads);
#pragma omp parallel for schedule(static)
for (int i=0; i<N_contacts; i++)
{
...
// particle-particle contacts
for (int j=0; j<N_contacts_pp; j++)
{
for(int pt_id ...)
{
// check all particle_i's surface points to particle_j
// do_the_actual_work
}
}
// particle-wall contacts
for (int k=0; k<N_contacts_pw; k++)
{
for(int pt_id ...)
{
// check all particle_i's surface points to wall_k
// do_the_actual_work
}
}
}
Tried N_threads = 1,2,4,6,8,10,12; for constant time-steps, the CPU time is more or less the same. Why OpenMP parallel on the out-most loop in B.2
not working? could not figure out:(
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论