使用皮卡德迭代中的矩阵列表优化计算
目前我正在使用一些 Mathematica 代码来进行皮卡德迭代。代码本身工作得很好,但我正在努力提高它的效率。我已经取得了一些成功,但正在寻求建议。也许无法再加快速度,但我已经没有想法了,希望比我更有编程/Mathematica 经验的人能够提出一些建议。我仅发布迭代本身,但可以根据需要提供其他信息。
下面的代码已根据要求编辑为完全可执行
此外,我将其从 While 循环更改为 Do 循环,以使测试更容易,因为不需要收敛。
Clear["Global`*"]
ngrid = 2048;
delr = 4/100;
delk = \[Pi]/delr/ngrid;
rvalues = Table[(i - 1/2) delr, {i, 1, ngrid}];
kvalues = Table[(i - 1/2) delk, {i, 1, ngrid}];
wa[x_] := (19 + .5 x) Exp[-.7 x] + 1
wb[x_] := (19 + .1 x) Exp[-.2 x] + 1
wd = SetPrecision[
Table[{{wa[(i - 1/2) delk], 0}, {0, wb[(i - 1/2) delk]}}, {i, 1,
ngrid}], 26];
sigmaAA = 1;
hcloseAA = {};
i = 1;
While[(i - 1/2)*delr < sigmaAA, hcloseAA = Append[hcloseAA, -1]; i++]
hcloselenAA = Length[hcloseAA];
hcloseAB = hcloseAA;
hcloselenAB = hcloselenAA;
hcloseBB = hcloseAA;
hcloselenBB = hcloselenAA;
ccloseAA = {};
i = ngrid;
While[(i - 1/2)*delr >= sigmaAA, ccloseAA = Append[ccloseAA, 0]; i--]
ccloselenAA = Length[ccloseAA];
ccloselenAA = Length[ccloseAA];
ccloseAB = ccloseAA;
ccloselenAB = ccloselenAA;
ccloseBB = ccloseAA;
ccloselenBB = ccloselenAA;
na = 20;
nb = 20;
pa = 27/(1000 \[Pi]);
pb = 27/(1000 \[Pi]);
p = {{na pa, 0}, {0, nb pb}};
id = {{1, 0}, {0, 1}};
AFD = 1;
AFDList = {};
timelist = {};
gammainitial = Table[{{0, 0}, {0, 0}}, {ngrid}];
gammafirst = gammainitial;
step = 1;
tol = 10^-7;
old = 95/100;
new = 1 - old;
Do[
t = AbsoluteTime[];
extractgAA = Table[Extract[gammafirst, {i, 1, 1}], {i, hcloselenAA}];
extractgBB = Table[Extract[gammafirst, {i, 2, 2}], {i, hcloselenBB}];
extractgAB = Table[Extract[gammafirst, {i, 1, 2}], {i, hcloselenAB}];
csolutionAA = (Join[hcloseAA - extractgAA, ccloseAA]) rvalues;
csolutionBB = (Join[hcloseBB - extractgBB, ccloseBB]) rvalues;
csolutionAB = (Join[hcloseAB - extractgAB, ccloseAB]) rvalues;
chatAA = FourierDST[SetPrecision[csolutionAA, 32], 4];
chatBB = FourierDST[SetPrecision[csolutionBB, 32], 4];
chatAB = FourierDST[SetPrecision[csolutionAB, 32], 4];
chatmatrix =
2 \[Pi] delr Sqrt[2*ngrid]*
Transpose[{Transpose[{chatAA, chatAB}],
Transpose[{chatAB, chatBB}]}]/kvalues;
gammahat =
Table[(wd[[i]].chatmatrix[[i]].(Inverse[
id - p.wd[[i]].chatmatrix[[i]]]).wd[[i]] -
chatmatrix[[i]]) kvalues[[i]], {i, ngrid}];
gammaAA =
FourierDST[SetPrecision[Table[gammahat[[i, 1, 1]], {i, ngrid}], 32],
4];
gammaBB =
FourierDST[SetPrecision[Table[gammahat[[i, 2, 2]], {i, ngrid}], 32],
4];
gammaAB =
FourierDST[SetPrecision[Table[gammahat[[i, 1, 2]], {i, ngrid}], 32],
4];
gammasecond =
Transpose[{Transpose[{gammaAA, gammaAB}],
Transpose[{gammaAB, gammaBB}]}]/(rvalues 2 \[Pi] delr Sqrt[
2*ngrid]);
AFD = Sqrt[
1/ngrid Sum[((gammafirst[[i, 1, 1]] -
gammasecond[[i, 1, 1]])/(gammafirst[[i, 1, 1]] +
gammasecond[[i, 1, 1]]))^2 + ((gammafirst[[i, 2, 2]] -
gammasecond[[i, 2, 2]])/(gammafirst[[i, 2, 2]] +
gammasecond[[i, 2, 2]]))^2 + ((gammafirst[[i, 1, 2]] -
gammasecond[[i, 1, 2]])/(gammafirst[[i, 1, 2]] +
gammasecond[[i, 1, 2]]))^2 + ((gammafirst[[i, 2, 1]] -
gammasecond[[i, 2, 1]])/(gammafirst[[i, 2, 1]] +
gammasecond[[i, 2, 1]]))^2, {i, 1, ngrid}]];
gammafirst = old gammafirst + new gammasecond;
time2 = AbsoluteTime[] - t;
timelist = Append[timelist, time2], {1}]
Print["Mean time per calculation = ", Mean[timelist]]
Print["STD time per calculation = ", StandardDeviation[timelist]]
只是一些关于事物的注释
ngrid、delr、delk、rvalues、kvalues 只是用于使问题离散的值。通常它们是
ngrid = 2048;
delr = 4/100;
delk = \[Pi]/delr/ngrid;
rvalues = Table[(i - 1/2) delr, {i, 1, ngrid}];
kvalues = Table[(i - 1/2) delk, {i, 1, ngrid}];
所有使用的矩阵都是 2 x 2 具有相同的非对角线
单位矩阵和 P 矩阵(实际上是密度)是
p = {{na pa, 0}, {0, nb pb}};
id = {{1, 0}, {0, 1}};
我发现的计算中的主要慢点是 FourierDST code> 计算(前向变换和反向变换占计算时间接近40%) gammahat 计算占40%,其余时间以AFD 计算为主。) 在我的 i7 处理器上,每个周期的平均计算时间为 1.52 秒。我希望能在一秒钟内完成,但这可能是不可能的。 我的希望是引入一些并行计算,这是使用 ParallelTable
命令以及使用 ParallelSubmit
WaitAll
进行尝试的。然而,我发现并行计算的任何加速都被从主内核到其他内核的通信时间所抵消。(至少这是我的假设,因为对新数据的计算所需的时间是重新计算现有数据的两倍。我认为这意味着传播新列表的速度变慢)我尝试了 DistributDefinitions 以及 SetSharedVariable,但是无法让它执行任何操作。
我想知道的一件事是使用 Table
进行离散计算是否是最好的方法?
我还以为我可以以能够编译它的方式重写它,但我的理解是,只有在处理机器精度时才有效,而我需要以更高的精度工作以获得收敛。
预先感谢您的任何建议。
Currently I am working with some Mathematica code to do a Picard Iteration. The code itself works fine but I am trying to make it more efficient. I have had some success but am looking for suggestions. It may not be possible to speed it up anymore but I have run out of ideas and am hoping people with more experience with programming/Mathematica than me might be able to make some suggestions. I am only posting the Iteration itself but can supply additional information as needed.
The code below was edited to be a fully executable as requested
Also I changed it from a While to a Do loop to make testing easier as convergence is not required.
Clear["Global`*"]
ngrid = 2048;
delr = 4/100;
delk = \[Pi]/delr/ngrid;
rvalues = Table[(i - 1/2) delr, {i, 1, ngrid}];
kvalues = Table[(i - 1/2) delk, {i, 1, ngrid}];
wa[x_] := (19 + .5 x) Exp[-.7 x] + 1
wb[x_] := (19 + .1 x) Exp[-.2 x] + 1
wd = SetPrecision[
Table[{{wa[(i - 1/2) delk], 0}, {0, wb[(i - 1/2) delk]}}, {i, 1,
ngrid}], 26];
sigmaAA = 1;
hcloseAA = {};
i = 1;
While[(i - 1/2)*delr < sigmaAA, hcloseAA = Append[hcloseAA, -1]; i++]
hcloselenAA = Length[hcloseAA];
hcloseAB = hcloseAA;
hcloselenAB = hcloselenAA;
hcloseBB = hcloseAA;
hcloselenBB = hcloselenAA;
ccloseAA = {};
i = ngrid;
While[(i - 1/2)*delr >= sigmaAA, ccloseAA = Append[ccloseAA, 0]; i--]
ccloselenAA = Length[ccloseAA];
ccloselenAA = Length[ccloseAA];
ccloseAB = ccloseAA;
ccloselenAB = ccloselenAA;
ccloseBB = ccloseAA;
ccloselenBB = ccloselenAA;
na = 20;
nb = 20;
pa = 27/(1000 \[Pi]);
pb = 27/(1000 \[Pi]);
p = {{na pa, 0}, {0, nb pb}};
id = {{1, 0}, {0, 1}};
AFD = 1;
AFDList = {};
timelist = {};
gammainitial = Table[{{0, 0}, {0, 0}}, {ngrid}];
gammafirst = gammainitial;
step = 1;
tol = 10^-7;
old = 95/100;
new = 1 - old;
Do[
t = AbsoluteTime[];
extractgAA = Table[Extract[gammafirst, {i, 1, 1}], {i, hcloselenAA}];
extractgBB = Table[Extract[gammafirst, {i, 2, 2}], {i, hcloselenBB}];
extractgAB = Table[Extract[gammafirst, {i, 1, 2}], {i, hcloselenAB}];
csolutionAA = (Join[hcloseAA - extractgAA, ccloseAA]) rvalues;
csolutionBB = (Join[hcloseBB - extractgBB, ccloseBB]) rvalues;
csolutionAB = (Join[hcloseAB - extractgAB, ccloseAB]) rvalues;
chatAA = FourierDST[SetPrecision[csolutionAA, 32], 4];
chatBB = FourierDST[SetPrecision[csolutionBB, 32], 4];
chatAB = FourierDST[SetPrecision[csolutionAB, 32], 4];
chatmatrix =
2 \[Pi] delr Sqrt[2*ngrid]*
Transpose[{Transpose[{chatAA, chatAB}],
Transpose[{chatAB, chatBB}]}]/kvalues;
gammahat =
Table[(wd[[i]].chatmatrix[[i]].(Inverse[
id - p.wd[[i]].chatmatrix[[i]]]).wd[[i]] -
chatmatrix[[i]]) kvalues[[i]], {i, ngrid}];
gammaAA =
FourierDST[SetPrecision[Table[gammahat[[i, 1, 1]], {i, ngrid}], 32],
4];
gammaBB =
FourierDST[SetPrecision[Table[gammahat[[i, 2, 2]], {i, ngrid}], 32],
4];
gammaAB =
FourierDST[SetPrecision[Table[gammahat[[i, 1, 2]], {i, ngrid}], 32],
4];
gammasecond =
Transpose[{Transpose[{gammaAA, gammaAB}],
Transpose[{gammaAB, gammaBB}]}]/(rvalues 2 \[Pi] delr Sqrt[
2*ngrid]);
AFD = Sqrt[
1/ngrid Sum[((gammafirst[[i, 1, 1]] -
gammasecond[[i, 1, 1]])/(gammafirst[[i, 1, 1]] +
gammasecond[[i, 1, 1]]))^2 + ((gammafirst[[i, 2, 2]] -
gammasecond[[i, 2, 2]])/(gammafirst[[i, 2, 2]] +
gammasecond[[i, 2, 2]]))^2 + ((gammafirst[[i, 1, 2]] -
gammasecond[[i, 1, 2]])/(gammafirst[[i, 1, 2]] +
gammasecond[[i, 1, 2]]))^2 + ((gammafirst[[i, 2, 1]] -
gammasecond[[i, 2, 1]])/(gammafirst[[i, 2, 1]] +
gammasecond[[i, 2, 1]]))^2, {i, 1, ngrid}]];
gammafirst = old gammafirst + new gammasecond;
time2 = AbsoluteTime[] - t;
timelist = Append[timelist, time2], {1}]
Print["Mean time per calculation = ", Mean[timelist]]
Print["STD time per calculation = ", StandardDeviation[timelist]]
Just some notes on things
ngrid,delr, delk, rvalues, kvalues are just the values used in making the problem discrete. Typically they are
ngrid = 2048;
delr = 4/100;
delk = \[Pi]/delr/ngrid;
rvalues = Table[(i - 1/2) delr, {i, 1, ngrid}];
kvalues = Table[(i - 1/2) delk, {i, 1, ngrid}];
All matrices being used are 2 x 2 with identical off-diagonals
The identity matrix and the P matrix(it is actually for the density) are
p = {{na pa, 0}, {0, nb pb}};
id = {{1, 0}, {0, 1}};
The major slow spots in the calculation I have identified are the FourierDST
calculations (the forward and back transforms account for close to 40% of the calculation time) The gammahat calculation accounts for 40% of the time with the remaining time dominated by the AFD calculation.)
On my i7 Processor the average calculation time per cycle is 1.52 seconds. My hope is to get it under a second but that may not be possible.
My hope had been to introduce some parallel computation this was tried with both ParallelTable
commands as well as using the ParallelSubmit
WaitAll
. However, I found that any speedup from the parallel calculation was offset by the communication time from the Master Kernel to the the other Kernels.(at least that is my assumption as calculations on new data takes twice as long as just recalculating the existing data. I assumed this meant that the slowdown was in disseminating the new lists) I played around with DistributDefinitions
as well as SetSharedVariable
, however, was unable to get that to do anything.
One thing I am wondering is if using Table
for doing my discrete calculations is the best way to do this?
I had also thought I could possibly rewrite this in such a manner as to be able to compile it but my understanding is that only will work if you are dealing with machine precision where I am needing to working with higher precision to get convergence.
Thank you in advance for any suggestions.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我将等待 acl 建议的代码,但首先,我怀疑此构造:
可以编写,并且执行速度更快,如:
但我被迫猜测您的数据的形状。
I will wait for the code acl suggests, but off the top, I suspect that this construct:
may be written, and will execute faster, as:
But I am forced to guess the shape of your data.
在几行中使用:
您可以删除
Table
:并且,如果您真的非常需要这个
SetPrecision
,您不能在gammahat的计算中立即执行它?AFAI 可以看到,gammahat 计算中使用的所有数字都是准确的。这可能是故意的,但速度很慢。您可以考虑使用近似数字。
编辑
使用最新编辑中的完整代码,只需在第二行和第三行中添加
//N
即可将时间至少缩短一半,而不会过多降低数值精度。如果我比较 res={gammafirst, gammasecond, AFD} 中的所有数字,则原始数据与添加了 //N 的数据之间的差异为res1 - res2 // Flatten // Total
==> 1.88267*10^-13删除所有
SetPrecision
内容可使代码速度提高 7 倍,并且结果似乎具有相似的准确性。In the several lines using:
you could remove the
Table
:And, if you really, really need this
SetPrecision
, couldn't you do it at once in the calculation of gammahat?AFAI can see, all numbers used in the calculations of gammahat are exact. This may be on purpose but it is slow. You might consider using approximate numbers instead.
EDIT
With the complete code in your latest edit just adding an
//N
to your 2nd and 3rd line cuts timing at least in half without reducing numerical accuracy much. If I compare all the numbers in res={gammafirst, gammasecond, AFD} the difference between the original and with //N added isres1 - res2 // Flatten // Total
==> 1.88267*10^-13Removing all the
SetPrecision
stuff speeds up the code by a factor of 7 and the results seem to be of similar accuracy.