如何在 HPC 中并行化此代码?
s=1
r=m=n=o=p=q=u=t=19
myfile = fopen ("sequence2.txt", "w", "ieee-le");
for a=0:1
if(a==1)
r=5
endif
for b=0:r
if(a==1 && b==5)
m=11
endif
for c=0:m
n=o=19
for d=0:1
if(d==1)
n=5
endif
for e=0:n
if(d==1 && e==5)
o=11
endif
for f=0:o
p=q=19
for g=0:1
if(g==1)
p=5
endif
for h=0:p
if(g==1 && h==5)
q=11
endif
for i=0:q
t=u=19
for j=0:1
if(j==1)
t=5
endif
for k=0:t
if(j==1 && k==5)
u=11
endif
for l=0:u
s=s+1
fputs(myfile,num2str(a));
fputs(myfile,".");
fputs(myfile,num2str(b));
fputs(myfile,".");
fputs(myfile,num2str(c));
fputs(myfile,":");
fflush(stdout);
fputs(myfile,num2str(d));
fputs(myfile,".");
fputs(myfile,num2str(e));
fputs(myfile,".");
fputs(myfile,num2str(f));
fputs(myfile,":");
fflush(stdout);
fputs(myfile,num2str(g));
fputs(myfile,".");
fputs(myfile,num2str(h));
fputs(myfile,".");
fputs(myfile,num2str(i));
fputs(myfile,":");
fflush(stdout);
fputs(myfile,num2str(j));
fputs(myfile,".");
fputs(myfile,num2str(k));
fputs(myfile,".");
fputs(myfile,num2str(l));
fputs(myfile,"\n");
fflush(stdout);
end
end
end
end
end
end
end
end
end
end
end
end
上面的八度代码是生成一个写入文本文件的数字序列。由于它生成大约 2^36 个数字,因此需要几天时间才能完成执行。那么任何人都可以让我们知道如何在 HPC 中并行化此代码。
s=1
r=m=n=o=p=q=u=t=19
myfile = fopen ("sequence2.txt", "w", "ieee-le");
for a=0:1
if(a==1)
r=5
endif
for b=0:r
if(a==1 && b==5)
m=11
endif
for c=0:m
n=o=19
for d=0:1
if(d==1)
n=5
endif
for e=0:n
if(d==1 && e==5)
o=11
endif
for f=0:o
p=q=19
for g=0:1
if(g==1)
p=5
endif
for h=0:p
if(g==1 && h==5)
q=11
endif
for i=0:q
t=u=19
for j=0:1
if(j==1)
t=5
endif
for k=0:t
if(j==1 && k==5)
u=11
endif
for l=0:u
s=s+1
fputs(myfile,num2str(a));
fputs(myfile,".");
fputs(myfile,num2str(b));
fputs(myfile,".");
fputs(myfile,num2str(c));
fputs(myfile,":");
fflush(stdout);
fputs(myfile,num2str(d));
fputs(myfile,".");
fputs(myfile,num2str(e));
fputs(myfile,".");
fputs(myfile,num2str(f));
fputs(myfile,":");
fflush(stdout);
fputs(myfile,num2str(g));
fputs(myfile,".");
fputs(myfile,num2str(h));
fputs(myfile,".");
fputs(myfile,num2str(i));
fputs(myfile,":");
fflush(stdout);
fputs(myfile,num2str(j));
fputs(myfile,".");
fputs(myfile,num2str(k));
fputs(myfile,".");
fputs(myfile,num2str(l));
fputs(myfile,"\n");
fflush(stdout);
end
end
end
end
end
end
end
end
end
end
end
end
The above code in octave is to generate a number sequence that is writing to a text file. it will take days to complete execution since it is generating around 2^36 numbers. so can anyone please let us know how to parallelise this code in hpc.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可能不需要并行化它;通过转向编译语言,您可以将速度提高约 10000 倍。 (说真的;见下文。)Octave 甚至 matlab 运行起来都会像糖蜜一样慢。它们非常适合大型矩阵运算,但是大量带有 if 语句的嵌套循环将运行缓慢缓慢。通常我建议将 Octave/Matlab 代码移至 FORTRAN,但由于无论如何您已经基本上使用 C 语句编写了文件 I/O,因此此代码的 C 等效项几乎可以自行编写:
运行上面的 Octave 代码和此 C代码(使用-O3编译)各一分钟,八度代码通过了序列中约2,163项,编译后的C代码通过了23,299,068项。所以这样很好。
就并行化而言,将其分解为独立的部分很容易,但它们的负载平衡不会特别好。如果你启动(比如说)26个进程,并给它们 (a=0,b=0), (a=0,b=1)...,(a=0,b=19),(a=1, b=0), (a=1,b=1),.. (a=1,b=5),它们都可以独立运行,当它们全部完成时,您可以连接结果。唯一的缺点是 a=0 作业的运行速度会比 a=1 作业慢一些,但这可能足以开始。
You may not need to parallelize this; you can speed this up by about 10000x by moving to a compiled language. (Seriously; see below.) Octave or even matlab are going to be slow as molasses running this. They're great for big matrix operations, but tonnes of nested loops with if statements in them is going to run slow slow slow. Normally I'd suggest moving Octave/Matlab code to FORTRAN, but since you've already got the file I/O written essentially with C statements anyway, the C equivalent of this code almost writes itself:
Running your octave code above and this C code (compiled with -O3) for one minute each, the octave code got through about 2,163 items in the sequence, and the compiled C code got through 23,299,068. So that's good.
In terms of parallelization, breaking this up into independant pieces is easy, but they won't be especially well load-balanced. If you start (say) 26 processes, and give them (a=0,b=0), (a=0,b=1)...,(a=0,b=19),(a=1,b=0), (a=1,b=1),.. (a=1,b=5), they can all run independantly and you can concatenate the results when they're all done. The only down side is that the a=0 jobs will run somewhat slower than the a=1 jobs, but maybe that's good enough to start.