监控 Mathematica 中并行计算的进度

发布于 2024-12-03 18:43:51 字数 630 浏览 1 评论 0原文

我正在构建一个大型ParallelTable,并且希望对计算的进行情况保持一定的了解。对于非并行表,以下代码效果很好:

counter = 1;
Timing[
 Monitor[
  Table[
   counter++
  , {n, 10^6}];
 , ProgressIndicator[counter, {0, 10^6}]
 ]
]

结果为 {0.943512, Null}。然而,对于并行情况,有必要使计数器在内核之间共享:

counter = 1;
SetSharedVariable[counter];
Timing[
 Monitor[
  ParallelTable[
   counter++
  , {n, 10^4}];
 , ProgressIndicator[counter, {0, 10^4}]
 ]
]

结果为{6.33388, Null}。由于每次更新时计数器的值都需要在内核之间来回传递,因此对性能的影响非常严重。关于如何了解计算的进展有什么想法吗?也许让每个内核都有自己的计数器值并每隔一段时间对它们求和?也许有某种方法可以确定表中的哪些元素已经外包给内核?

I'm building a large ParallelTable, and would like to maintain some sense of how the computation is going. For a non parallel table the following code does a great job:

counter = 1;
Timing[
 Monitor[
  Table[
   counter++
  , {n, 10^6}];
 , ProgressIndicator[counter, {0, 10^6}]
 ]
]

with the result {0.943512, Null}. For the parallel case, however, it's necessary to make the counter shared between the kernels:

counter = 1;
SetSharedVariable[counter];
Timing[
 Monitor[
  ParallelTable[
   counter++
  , {n, 10^4}];
 , ProgressIndicator[counter, {0, 10^4}]
 ]
]

with the result {6.33388, Null}. Since the value of counter needs to be passed back and forth between the kernels at every update, the performance hit is beyond severe. Any ideas for how to get some sense of how the computation is going? Perhaps letting each kernel have its own value for counter and summing them at intervals? Perhaps some way of determining what elements of the table have already been farmed out to the kernels?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

花期渐远 2024-12-10 18:43:51

当您说“也许让每个内核都有自己的计数器值并每隔一段时间对它们求和?”时,您几乎自己给出了答案。

尝试这样的事情:

counter = 1;
SetSharedVariable[counter];
ParallelEvaluate[last = AbsoluteTime[]; localcounter = 1;]
Timing[Monitor[
  ParallelTable[localcounter++; 
    If[AbsoluteTime[] - last > 1, last = AbsoluteTime[]; 
     counter += localcounter; localcounter = 0;], {n, 10^6}];, 
  ProgressIndicator[counter, {0, 10^6}]]]

请注意,它比第一个单 CPU 情况花费的时间更长,只是因为它实际上在循环中执行了某些操作。

您可以更改测试 AbsoluteTime[] - last > 1 到更频繁的东西,例如 AbsoluteTime[] - last > > 0.1。

You nearly gave the answer yourself, when you said "Perhaps letting each kernel have its own value for counter and summing them at intervals?".

Try something like this:

counter = 1;
SetSharedVariable[counter];
ParallelEvaluate[last = AbsoluteTime[]; localcounter = 1;]
Timing[Monitor[
  ParallelTable[localcounter++; 
    If[AbsoluteTime[] - last > 1, last = AbsoluteTime[]; 
     counter += localcounter; localcounter = 0;], {n, 10^6}];, 
  ProgressIndicator[counter, {0, 10^6}]]]

Note that it takes longer than your first single-CPU case only because it actually does something in the loop.

You can change the test AbsoluteTime[] - last > 1 to something more frequent like AbsoluteTime[] - last > 0.1.

微凉 2024-12-10 18:43:51

这似乎很难解决。来自手册

除非您使用共享变量,否则将执行并行计算
完全独立,不能互相影响。
此外,任何副作用,例如变量赋值,
发生的情况作为评价的一部分将会丢失。唯一的效果是a
并行求值是最后返回其结果。

但是,仍然可以使用旧的 Print 语句获取粗略的进度指示器:

在此处输入图像描述

This seems hard to solve. From the manual:

Unless you use shared variables, the parallel evaluations performed
are completely independent and cannot influence each other.
Furthermore, any side effects, such as assignments to variables, that
happen as part of evaluations will be lost. The only effect of a
parallel evaluation is that its result is returned at the end.

However, a rough progress indicator can still be gotten using the old Printstatement:

enter image description here

七色彩虹 2024-12-10 18:43:51

另一种方法是对 LinkWrite 和 LinkRead 进行跟踪并修改它们的跟踪消息以进行一些有用的统计。

首先,启动一些并行内核:

LaunchKernels[]

这将为并行内核设置链接对象。

然后为链接读取和写入计数器定义一个 init 函数:

init[] := Map[(LinkWriteCounter[#] = 0; LinkReadCounter[#] = 0) &, Links[]]

接下来,您希望在读取或写入链接时递增这些计数器:

Unprotect[Message];
Message[LinkWrite::trace, x_, y_] := LinkWriteCounter[x[[1, 1]]] += 1;
Message[LinkRead::trace, x_, y_] := LinkReadCounter[x[[1, 1]]] += 1;
Protect[Message];

这里,x[[1,1]] 是有问题的 LinkObject。

现在,打开 LinkWrite 和 LinkRead 上的跟踪:

On[LinkWrite];
On[LinkRead];

要格式化进度显示,首先稍微缩短 LinkObject 显示,因为它们相当冗长:

Format[LinkObject[k_, a_, b_]] := Kernel[a, b]

这是一种动态显示子内核链接的读取和写入的方法:

init[];
Dynamic[Grid[Join[
  {{"Kernel", "Writes", "Reads"}}, 
  Map[{#, LinkWriteCounter[#]/2, LinkReadCounter[#]/2} &, 
  Select[Links[], StringMatchQ[First[#], "*subkernel*"] &
]]], Frame -> All]]

(I' m 将计数除以二,因为每个链接的读取和写入都会被跟踪两次)。

最后使用 10,000 个元素表对其进行测试:

init[];
ParallelTable[i, {i, 10^4}, Method -> "FinestGrained"];

如果一切正常,您应该会看到最终进度显示,每个内核大约有 5,000 次读取和写入:

内核会话的屏幕截图

这会造成中等性能损失:没有显示器时为 10.73 秒,使用显示器时为 13.69 秒。当然,使用“FinestGrained”选项并不是用于此特定并行计算的最佳方法。

Another approach is to put a trace on LinkWrite and LinkRead and modify their tracing messages to do some useful accounting.

First, launch some parallel kernels:

LaunchKernels[]

This will have set up the link objects for the parallel kernels.

Then define an init function for link read and write counters:

init[] := Map[(LinkWriteCounter[#] = 0; LinkReadCounter[#] = 0) &, Links[]]

Next, you want to increment these counters when their links are being read from or written to:

Unprotect[Message];
Message[LinkWrite::trace, x_, y_] := LinkWriteCounter[x[[1, 1]]] += 1;
Message[LinkRead::trace, x_, y_] := LinkReadCounter[x[[1, 1]]] += 1;
Protect[Message];

Here, x[[1,1]] is the LinkObject in question.

Now, turn on tracing on LinkWrite and LinkRead:

On[LinkWrite];
On[LinkRead];

To format the progress display, first shorten the LinkObject display a bit, since they are rather verbose:

Format[LinkObject[k_, a_, b_]] := Kernel[a, b]

And this is a way to display the reads and writes dynamically for the subkernel links:

init[];
Dynamic[Grid[Join[
  {{"Kernel", "Writes", "Reads"}}, 
  Map[{#, LinkWriteCounter[#]/2, LinkReadCounter[#]/2} &, 
  Select[Links[], StringMatchQ[First[#], "*subkernel*"] &
]]], Frame -> All]]

(I'm dividing the counts by two, because every link read and write is traced twice).

And finally test it out with a 10,000 element table:

init[];
ParallelTable[i, {i, 10^4}, Method -> "FinestGrained"];

If everything worked, you should see a final progress display with about 5,000 read and writes for each kernel:

Screen shot of the kernel session

There is medium performance penalty for this: 10.73s without the monitor, and 13.69s with the monitor. And of course using the "FinestGrained" option is not the most optimal method to use for this particular parallel computation.

我最亲爱的 2024-12-10 18:43:51

您可以从 Yuri Kandrashkin 开发的 Spin`System`LoopControl` 包中获得一些想法:

screenshot from自旋代数主页

Spin` 包的公告:

Hi group,

I have prepared the package Spin` that consists of several applications
which are designed for research in the area of magnetic resonance and 
spin chemistry and physics.

The applications Unit` and LoopControl` can be useful to a broader
audience.

The package and short outline is available at:
http://sites.google.com/site/spinalgebra/.

Sincerely,
Yuri Kandrashkin. 

You can get some ideas from the package Spin`System`LoopControl` developed by Yuri Kandrashkin:

screenshot from Spin Algebra home

Announce of the Spin` package:

Hi group,

I have prepared the package Spin` that consists of several applications
which are designed for research in the area of magnetic resonance and 
spin chemistry and physics.

The applications Unit` and LoopControl` can be useful to a broader
audience.

The package and short outline is available at:
http://sites.google.com/site/spinalgebra/.

Sincerely,
Yuri Kandrashkin. 
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文