使用皮卡德迭代中的矩阵列表优化计算

发布于 2024-11-29 05:10:35 字数 4455 浏览 4 评论 0原文

目前我正在使用一些 Mathematica 代码来进行皮卡德迭代。代码本身工作得很好，但我正在努力提高它的效率。我已经取得了一些成功，但正在寻求建议。也许无法再加快速度，但我已经没有想法了，希望比我更有编程/Mathematica 经验的人能够提出一些建议。我仅发布迭代本身，但可以根据需要提供其他信息。

下面的代码已根据要求编辑为完全可执行

此外，我将其从 While 循环更改为 Do 循环，以使测试更容易，因为不需要收敛。

Clear["Global`*"]

ngrid = 2048;
delr = 4/100;
delk = \[Pi]/delr/ngrid;
rvalues = Table[(i - 1/2) delr, {i, 1, ngrid}];
kvalues = Table[(i - 1/2) delk, {i, 1, ngrid}];
wa[x_] := (19 + .5 x) Exp[-.7 x] + 1
wb[x_] := (19 + .1 x) Exp[-.2 x] + 1
wd = SetPrecision[
   Table[{{wa[(i - 1/2) delk], 0}, {0, wb[(i - 1/2) delk]}}, {i, 1, 
     ngrid}], 26];
sigmaAA = 1;
hcloseAA = {};
i = 1;
While[(i - 1/2)*delr < sigmaAA, hcloseAA = Append[hcloseAA, -1]; i++]
hcloselenAA = Length[hcloseAA];
hcloseAB = hcloseAA;
hcloselenAB = hcloselenAA;
hcloseBB = hcloseAA;
hcloselenBB = hcloselenAA;
ccloseAA = {};
i = ngrid;
While[(i - 1/2)*delr >= sigmaAA, ccloseAA = Append[ccloseAA, 0]; i--]
ccloselenAA = Length[ccloseAA];
ccloselenAA = Length[ccloseAA];
ccloseAB = ccloseAA;
ccloselenAB = ccloselenAA;
ccloseBB = ccloseAA;
ccloselenBB = ccloselenAA;
na = 20;
nb = 20;
pa = 27/(1000 \[Pi]);
pb = 27/(1000 \[Pi]);
p = {{na pa, 0}, {0, nb pb}};
id = {{1, 0}, {0, 1}};
AFD = 1;
AFDList = {};
timelist = {};
gammainitial = Table[{{0, 0}, {0, 0}}, {ngrid}];
gammafirst = gammainitial;
step = 1;
tol = 10^-7;
old = 95/100;
new = 1 - old;

Do[
 t = AbsoluteTime[];
 extractgAA = Table[Extract[gammafirst, {i, 1, 1}], {i, hcloselenAA}];
 extractgBB = Table[Extract[gammafirst, {i, 2, 2}], {i, hcloselenBB}];
 extractgAB = Table[Extract[gammafirst, {i, 1, 2}], {i, hcloselenAB}];
 csolutionAA = (Join[hcloseAA - extractgAA, ccloseAA]) rvalues;
 csolutionBB = (Join[hcloseBB - extractgBB, ccloseBB]) rvalues;
 csolutionAB = (Join[hcloseAB - extractgAB, ccloseAB]) rvalues;
 chatAA = FourierDST[SetPrecision[csolutionAA, 32], 4];
 chatBB = FourierDST[SetPrecision[csolutionBB, 32], 4];
 chatAB = FourierDST[SetPrecision[csolutionAB, 32], 4];
 chatmatrix = 
  2 \[Pi] delr Sqrt[2*ngrid]*
   Transpose[{Transpose[{chatAA, chatAB}], 
      Transpose[{chatAB, chatBB}]}]/kvalues;
 gammahat = 
  Table[(wd[[i]].chatmatrix[[i]].(Inverse[
         id - p.wd[[i]].chatmatrix[[i]]]).wd[[i]] - 
      chatmatrix[[i]]) kvalues[[i]], {i, ngrid}];
 gammaAA = 
  FourierDST[SetPrecision[Table[gammahat[[i, 1, 1]], {i, ngrid}], 32],
    4];
 gammaBB = 
  FourierDST[SetPrecision[Table[gammahat[[i, 2, 2]], {i, ngrid}], 32],
    4];
 gammaAB = 
  FourierDST[SetPrecision[Table[gammahat[[i, 1, 2]], {i, ngrid}], 32],
    4];
 gammasecond = 
  Transpose[{Transpose[{gammaAA, gammaAB}], 
     Transpose[{gammaAB, gammaBB}]}]/(rvalues 2 \[Pi] delr Sqrt[
      2*ngrid]);
 AFD = Sqrt[
    1/ngrid Sum[((gammafirst[[i, 1, 1]] - 
           gammasecond[[i, 1, 1]])/(gammafirst[[i, 1, 1]] + 
           gammasecond[[i, 1, 1]]))^2 + ((gammafirst[[i, 2, 2]] - 
           gammasecond[[i, 2, 2]])/(gammafirst[[i, 2, 2]] + 
           gammasecond[[i, 2, 2]]))^2 + ((gammafirst[[i, 1, 2]] - 
           gammasecond[[i, 1, 2]])/(gammafirst[[i, 1, 2]] + 
           gammasecond[[i, 1, 2]]))^2 + ((gammafirst[[i, 2, 1]] - 
           gammasecond[[i, 2, 1]])/(gammafirst[[i, 2, 1]] + 
           gammasecond[[i, 2, 1]]))^2, {i, 1, ngrid}]];
 gammafirst = old gammafirst + new gammasecond;
 time2 = AbsoluteTime[] - t;
 timelist = Append[timelist, time2], {1}]
Print["Mean time per calculation = ", Mean[timelist]]
Print["STD time per calculation = ", StandardDeviation[timelist]]

只是一些关于事物的注释
ngrid、delr、delk、rvalues、kvalues 只是用于使问题离散的值。通常它们是

ngrid = 2048;
delr = 4/100;
delk = \[Pi]/delr/ngrid;
rvalues = Table[(i - 1/2) delr, {i, 1, ngrid}];
kvalues = Table[(i - 1/2) delk, {i, 1, ngrid}];

所有使用的矩阵都是 2 x 2 具有相同的非对角线

单位矩阵和 P 矩阵（实际上是密度）是

p = {{na pa, 0}, {0, nb pb}};
id = {{1, 0}, {0, 1}};

我发现的计算中的主要慢点是 FourierDST code> 计算（前向变换和反向变换占计算时间接近40%） gammahat 计算占40%，其余时间以AFD 计算为主。）在我的 i7 处理器上，每个周期的平均计算时间为 1.52 秒。我希望能在一秒钟内完成，但这可能是不可能的。我的希望是引入一些并行计算，这是使用 ParallelTable 命令以及使用 ParallelSubmit WaitAll 进行尝试的。然而，我发现并行计算的任何加速都被从主内核到其他内核的通信时间所抵消。（至少这是我的假设，因为对新数据的计算所需的时间是重新计算现有数据的两倍。我认为这意味着传播新列表的速度变慢）我尝试了 DistributDefinitions 以及 SetSharedVariable，但是无法让它执行任何操作。

我想知道的一件事是使用 Table 进行离散计算是否是最好的方法？

我还以为我可以以能够编译它的方式重写它，但我的理解是，只有在处理机器精度时才有效，而我需要以更高的精度工作以获得收敛。

预先感谢您的任何建议。

原文

Currently I am working with some Mathematica code to do a Picard Iteration. The code itself works fine but I am trying to make it more efficient. I have had some success but am looking for suggestions. It may not be possible to speed it up anymore but I have run out of ideas and am hoping people with more experience with programming/Mathematica than me might be able to make some suggestions. I am only posting the Iteration itself but can supply additional information as needed.

The code below was edited to be a fully executable as requested

Also I changed it from a While to a Do loop to make testing easier as convergence is not required.

Clear["Global`*"]

ngrid = 2048;
delr = 4/100;
delk = \[Pi]/delr/ngrid;
rvalues = Table[(i - 1/2) delr, {i, 1, ngrid}];
kvalues = Table[(i - 1/2) delk, {i, 1, ngrid}];
wa[x_] := (19 + .5 x) Exp[-.7 x] + 1
wb[x_] := (19 + .1 x) Exp[-.2 x] + 1
wd = SetPrecision[
   Table[{{wa[(i - 1/2) delk], 0}, {0, wb[(i - 1/2) delk]}}, {i, 1, 
     ngrid}], 26];
sigmaAA = 1;
hcloseAA = {};
i = 1;
While[(i - 1/2)*delr < sigmaAA, hcloseAA = Append[hcloseAA, -1]; i++]
hcloselenAA = Length[hcloseAA];
hcloseAB = hcloseAA;
hcloselenAB = hcloselenAA;
hcloseBB = hcloseAA;
hcloselenBB = hcloselenAA;
ccloseAA = {};
i = ngrid;
While[(i - 1/2)*delr >= sigmaAA, ccloseAA = Append[ccloseAA, 0]; i--]
ccloselenAA = Length[ccloseAA];
ccloselenAA = Length[ccloseAA];
ccloseAB = ccloseAA;
ccloselenAB = ccloselenAA;
ccloseBB = ccloseAA;
ccloselenBB = ccloselenAA;
na = 20;
nb = 20;
pa = 27/(1000 \[Pi]);
pb = 27/(1000 \[Pi]);
p = {{na pa, 0}, {0, nb pb}};
id = {{1, 0}, {0, 1}};
AFD = 1;
AFDList = {};
timelist = {};
gammainitial = Table[{{0, 0}, {0, 0}}, {ngrid}];
gammafirst = gammainitial;
step = 1;
tol = 10^-7;
old = 95/100;
new = 1 - old;

Do[
 t = AbsoluteTime[];
 extractgAA = Table[Extract[gammafirst, {i, 1, 1}], {i, hcloselenAA}];
 extractgBB = Table[Extract[gammafirst, {i, 2, 2}], {i, hcloselenBB}];
 extractgAB = Table[Extract[gammafirst, {i, 1, 2}], {i, hcloselenAB}];
 csolutionAA = (Join[hcloseAA - extractgAA, ccloseAA]) rvalues;
 csolutionBB = (Join[hcloseBB - extractgBB, ccloseBB]) rvalues;
 csolutionAB = (Join[hcloseAB - extractgAB, ccloseAB]) rvalues;
 chatAA = FourierDST[SetPrecision[csolutionAA, 32], 4];
 chatBB = FourierDST[SetPrecision[csolutionBB, 32], 4];
 chatAB = FourierDST[SetPrecision[csolutionAB, 32], 4];
 chatmatrix = 
  2 \[Pi] delr Sqrt[2*ngrid]*
   Transpose[{Transpose[{chatAA, chatAB}], 
      Transpose[{chatAB, chatBB}]}]/kvalues;
 gammahat = 
  Table[(wd[[i]].chatmatrix[[i]].(Inverse[
         id - p.wd[[i]].chatmatrix[[i]]]).wd[[i]] - 
      chatmatrix[[i]]) kvalues[[i]], {i, ngrid}];
 gammaAA = 
  FourierDST[SetPrecision[Table[gammahat[[i, 1, 1]], {i, ngrid}], 32],
    4];
 gammaBB = 
  FourierDST[SetPrecision[Table[gammahat[[i, 2, 2]], {i, ngrid}], 32],
    4];
 gammaAB = 
  FourierDST[SetPrecision[Table[gammahat[[i, 1, 2]], {i, ngrid}], 32],
    4];
 gammasecond = 
  Transpose[{Transpose[{gammaAA, gammaAB}], 
     Transpose[{gammaAB, gammaBB}]}]/(rvalues 2 \[Pi] delr Sqrt[
      2*ngrid]);
 AFD = Sqrt[
    1/ngrid Sum[((gammafirst[[i, 1, 1]] - 
           gammasecond[[i, 1, 1]])/(gammafirst[[i, 1, 1]] + 
           gammasecond[[i, 1, 1]]))^2 + ((gammafirst[[i, 2, 2]] - 
           gammasecond[[i, 2, 2]])/(gammafirst[[i, 2, 2]] + 
           gammasecond[[i, 2, 2]]))^2 + ((gammafirst[[i, 1, 2]] - 
           gammasecond[[i, 1, 2]])/(gammafirst[[i, 1, 2]] + 
           gammasecond[[i, 1, 2]]))^2 + ((gammafirst[[i, 2, 1]] - 
           gammasecond[[i, 2, 1]])/(gammafirst[[i, 2, 1]] + 
           gammasecond[[i, 2, 1]]))^2, {i, 1, ngrid}]];
 gammafirst = old gammafirst + new gammasecond;
 time2 = AbsoluteTime[] - t;
 timelist = Append[timelist, time2], {1}]
Print["Mean time per calculation = ", Mean[timelist]]
Print["STD time per calculation = ", StandardDeviation[timelist]]

Just some notes on things
ngrid,delr, delk, rvalues, kvalues are just the values used in making the problem discrete. Typically they are

ngrid = 2048;
delr = 4/100;
delk = \[Pi]/delr/ngrid;
rvalues = Table[(i - 1/2) delr, {i, 1, ngrid}];
kvalues = Table[(i - 1/2) delk, {i, 1, ngrid}];

All matrices being used are 2 x 2 with identical off-diagonals

The identity matrix and the P matrix(it is actually for the density) are

p = {{na pa, 0}, {0, nb pb}};
id = {{1, 0}, {0, 1}};

The major slow spots in the calculation I have identified are the FourierDST calculations (the forward and back transforms account for close to 40% of the calculation time) The gammahat calculation accounts for 40% of the time with the remaining time dominated by the AFD calculation.)
On my i7 Processor the average calculation time per cycle is 1.52 seconds. My hope is to get it under a second but that may not be possible.
My hope had been to introduce some parallel computation this was tried with both ParallelTable commands as well as using the ParallelSubmit WaitAll. However, I found that any speedup from the parallel calculation was offset by the communication time from the Master Kernel to the the other Kernels.(at least that is my assumption as calculations on new data takes twice as long as just recalculating the existing data. I assumed this meant that the slowdown was in disseminating the new lists) I played around with DistributDefinitions as well as SetSharedVariable, however, was unable to get that to do anything.

One thing I am wondering is if using Table for doing my discrete calculations is the best way to do this?

I had also thought I could possibly rewrite this in such a manner as to be able to compile it but my understanding is that only will work if you are dealing with machine precision where I am needing to working with higher precision to get convergence.

Thank you in advance for any suggestions.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

转身泪倾城 2024-12-06 05:10:35

我将等待 acl 建议的代码，但首先，我怀疑此构造：

Table[Extract[gammafirst, {i, 1, 1}], {i, hcloselenAA}]

可以编写，并且执行速度更快，如：

gammafirst[[hcloselenAA, 1, 1]]

但我被迫猜测您的数据的形状。

I will wait for the code acl suggests, but off the top, I suspect that this construct:

Table[Extract[gammafirst, {i, 1, 1}], {i, hcloselenAA}]

may be written, and will execute faster, as:

gammafirst[[hcloselenAA, 1, 1]]

But I am forced to guess the shape of your data.

回复收藏 0 原文

痴梦一场 2024-12-06 05:10:35

在几行中使用：

FourierDST[SetPrecision[Table[gammahat[[i, 1, 1]], {i, ngrid}], 32], 4];

您可以删除Table：

FourierDST[SetPrecision[gammahat[[All, 1, 1]], 32], 4];

并且，如果您真的非常需要这个SetPrecision，您不能在gammahat的计算中立即执行它？

AFAI 可以看到，gammahat 计算中使用的所有数字都是准确的。这可能是故意的，但速度很慢。您可以考虑使用近似数字。

编辑
使用最新编辑中的完整代码，只需在第二行和第三行中添加 //N 即可将时间至少缩短一半，而不会过多降低数值精度。如果我比较 res={gammafirst, gammasecond, AFD} 中的所有数字，则原始数据与添加了 //N 的数据之间的差异为 res1 - res2 // Flatten // Total ==> 1.88267*10^-13

删除所有 SetPrecision 内容可使代码速度提高 7 倍，并且结果似乎具有相似的准确性。

In the several lines using:

FourierDST[SetPrecision[Table[gammahat[[i, 1, 1]], {i, ngrid}], 32], 4];

you could remove the Table:

FourierDST[SetPrecision[gammahat[[All, 1, 1]], 32], 4];

And, if you really, really need this SetPrecision, couldn't you do it at once in the calculation of gammahat?

AFAI can see, all numbers used in the calculations of gammahat are exact. This may be on purpose but it is slow. You might consider using approximate numbers instead.

EDIT
With the complete code in your latest edit just adding an //N to your 2nd and 3rd line cuts timing at least in half without reducing numerical accuracy much. If I compare all the numbers in res={gammafirst, gammasecond, AFD} the difference between the original and with //N added is res1 - res2 // Flatten // Total ==> 1.88267*10^-13

Removing all the SetPrecision stuff speeds up the code by a factor of 7 and the results seem to be of similar accuracy.

回复收藏 0 原文

~没有更多了~