组织基于任务的科学计算

发布于 2024-11-25 09:17:03 字数 1066 浏览 1 评论 0原文

我有一个计算代数任务需要编码。该问题被分解为明确定义的个体任务,这些任务自然形成一棵树 - 该任务本质上是组合的,因此有一个主要任务需要少量的子计算才能获得其结果。那些子计算还有子子计算等等。每个计算仅取决于树中位于其下方的计算(假设根节点位于顶部)。分支机构之间无需进行数据共享。在较低级别,子任务的数量可能非常大。

我之前以函数式方式对其进行了编码,根据需要调用函数并将所有内容存储在 RAM 中。这是一个糟糕的方法,但我当时更关心的是理论。

由于多种原因,我计划用 C++ 重写代码。我有一些要求:

  • 检查点:计算需要很长时间,因此我需要能够在任何时候停止并稍后恢复。
  • 将各个任务作为对象分开:这有助于我很好地掌握自己在计算中的位置,并提供了一种通过序列化进行检查点的干净方法。
  • 多线程:该任务显然是令人尴尬的并行性,因此利用它会很巧妙。我可能想为此使用 Boost 线程。

我想要关于如何实际实施这样一个系统的建议。我想到的方法:

  1. 将任务实现为一个简单的堆栈。当您执行需要完成子计算的任务时,它会检查是否具有所需的所有子计算。如果没有,它会创建子任务并将它们扔到堆栈上。如果是,则计算其结果并将其自身从堆栈中弹出。
  2. 将任务存储为树并执行类似深度优先访问者模式的操作。这将在开始时创建所有任务,然后计算将遍历树。

这些似乎不太正确,因为较低级别的问题需要大量的子任务。我想我可以在这个级别以迭代器的方式接近它。

我觉得我想得太多了,而且已经有一种简单、行之有效的方法来做这样的事情了。有吗?


重要的技术细节:

  • 任务树有 5 个级别。
  • 对于所有级别,树的分支因子都非常小(例如,在 2 到 5 之间),除了最低级别的分支因子约为几百万。
  • 每个单独的任务只需要存储数十字节大的结果。我不介意尽可能多地使用磁盘,只要它不影响性能。
  • 为了调试,我必须能够回忆/重新计算任何单独的任务。
  • 所有计算都是离散数学:使用整数、多项式和群进行计算。根本没有浮点数。

I have a computational algebra task I need to code up. The problem is broken into well-defined individuals tasks that naturally form a tree - the task is combinatorial in nature, so there's a main task which requires a small number of sub-calculations to get its results. Those sub-calculations have sub-sub-calculations and so on. Each calculation only depends on the calculations below it in the tree (assuming the root node is the top). No data sharing needs to happen between branches. At lower levels the number of subtasks may be extremely large.

I had previously coded this up in a functional fashion, calling the functions as needed and storing everything in RAM. This was a terrible approach, but I was more concerned about the theory then.

I'm planning to rewrite the code in C++ for a variety of reasons. I have a few requirements:

  • Checkpointing: The calculation takes a long time, so I need to be able to stop at any point and resume later.
  • Separate individual tasks as objects: This helps me keep a good handle of where I am in the computations, and offers a clean way to do checkpointing via serialization.
  • Multi-threading: The task is clearly embarrassingly parallel, so it'd be neat to exploit that. I'd probably want to use Boost threads for this.

I would like suggestions on how to actually implement such a system. Ways I've thought of doing it:

  1. Implement tasks as a simple stack. When you hit a task that needs subcalculations done, it checks if it has all the subcalculations it requires. If not, it creates the subtasks and throws them onto the stack. If it does, then it calculates its result and pops itself from the stack.
  2. Store the tasks as a tree and do something like a depth-first visitor pattern. This would create all the tasks at the start and then computation would just traverse the tree.

These don't seem quite right because of the problems of the lower levels requiring a vast number of subtasks. I could approach it in a iterator fashion at this level, I guess.

I feel like I'm over-thinking it and there's already a simple, well-established way to do something like this. Is there one?


Technical details in case they matter:

  • The task tree has 5 levels.
  • Branching factor of the tree is really small (say, between 2 and 5) for all levels except the lowest which is on the order of a few million.
  • Each individual task would only need to store a result tens of bytes large. I don't mind using the disk as much as possible, so long as it doesn't kill performance.
  • For debugging, I'd have to be able to recall/recalculate any individual task.
  • All the calculations are discrete mathematics: calculations with integers, polynomials, and groups. No floating point at all.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

噩梦成真你也成魔 2024-12-02 09:17:03

有一个主要任务需要少量的子计算才能得到结果。那些子计算还有子子计算等等。每个计算仅取决于树中位于其下方的计算(假设根节点位于顶部)。分支机构之间无需进行数据共享。在较低级别,子任务的数量可能非常大...等等恢复、多线程等等。

如果我错了,请纠正我,但在我看来,您正在准确地描述映射减少算法。

只需阅读维基百科关于 map-reduce 的内容即可:

“映射”步骤:主节点获取输入,将其划分为更小的子问题,并将其分发给工作节点。工作节点可以依次再次执行此操作,从而形成多级树结构。工作节点处理较小的问题,并将答案传递回其主节点。

“归约”步骤:然后主节点获取所有子问题的答案,并以某种方式组合它们以获得输出 - 它最初试图解决的问题的答案。

使用现有的 MapReduce 框架可以为您节省大量时间。

我只是谷歌“map reduce C++”,我开始得到结果,特别是 boost http:// www.craighenderson.co.uk/mapreduce/

there's a main task which requires a small number of sub-calculations to get its results. Those sub-calculations have sub-sub-calculations and so on. Each calculation only depends on the calculations below it in the tree (assuming the root node is the top). No data sharing needs to happen between branches. At lower levels the number of subtasks may be extremely large... blah blah resuming, multi-threading, etc.

Correct me if I'm wrong, but it seems to me that you are exactly describing a map-reduce algorithm.

Just read what wikipedia says about map-reduce :

"Map" step: The master node takes the input, partitions it up into smaller sub-problems, and distributes those to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes that smaller problem, and passes the answer back to its master node.

"Reduce" step: The master node then takes the answers to all the sub-problems and combines them in some way to get the output – the answer to the problem it was originally trying to solve.

Using an existing mapreduce framework could save you a huge amount of time.

I just google "map reduce C++" and I start to get results, notably one in boost http://www.craighenderson.co.uk/mapreduce/

濫情▎り 2024-12-02 09:17:03

这些似乎不太正确,因为较低级别的问题需要大量的子任务。我想我可以在这个级别以迭代器的方式接近它。

您绝对不想要数百万个受 CPU 限制的线程。您最多需要 N 个 CPU 绑定线程,其中 N 是计算机上 CPU 数量与每个 CPU 核心数量的乘积。超过 N 一点点,你就会减慢速度。超过 N 太多,你的速度就会大大减慢。机器几乎将所有时间都花在上下文中和上下文中交换线程,而执行线程本身的时间却很少。超过 N 太多,您的机器很可能会崩溃(或达到线程的某些限制)。如果您想同时处理大量(以及大量)并行任务,您要么需要使用多台机器,要么使用显卡。

These don't seem quite right because of the problems of the lower levels requiring a vast number of subtasks. I could approach it in a iterator fashion at this level, I guess.

You definitely do not want millions of CPU-bound threads. You want at most N CPU-bound threads, where N is the product of the number of CPUs and the number of cores per CPU on your machine. Exceed N by a little bit and you are slowing things down a bit. Exceed N by a lot and you are slowing things down a whole lot. The machine will spend almost all its time swapping threads in and out of context, spending very little time executing the threads themselves. Exceed N by a whole lot and you will most likely crash your machine (or hit some limit on threads). If you want to farm lots and lots (and lots and lots) of parallel tasks out at once, you either need to use multiple machines or use your graphics card.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文