F# 树上并行化函数的选项

发布于 2024-10-22 08:18:04 字数 540 浏览 3 评论 0原文

在我的项目中,我有一个如下表示的数据结构:

type 'a Tree =
| Leaf of 'a
| Node of 'a Tree array

由于遍历大树的成本,我必须在此数据结构上并行化以下一些函数:

  • 存在映射
  • 折叠
  • (存在满足 谓词)
  • 减少/转换(可选)

由于我正在处理的问题的性质,每个节点的分支数量各不相同,并且叶级别的工作负载非常小。 第一个问题是我应该考虑在树上并行执行什么选项。但是,我尝试在每个节点上使用 Array.Parallel 模块中的函数,因为并行度太大,并行版本甚至比顺序版本慢。如果有必要,我可以将数组表示更改为 ListPSeq

第二个问题是如何控制这些函数的并行度。我正在考虑通过树的深度、每个节点的分支数量、叶级的工作负载复杂性以及叶级的叶数进行控制。然而,将它们组合在一起似乎是复杂且不可预测的。

In my project, I have a data structure represented as follows:

type 'a Tree =
| Leaf of 'a
| Node of 'a Tree array

Due to the cost of traversing through large trees, I have to parallelize some following functions on this data structure:

  • map
  • fold
  • exists (exist a node satisfying a
    predicate)
  • reduce / transform (optional)

Because of nature of the problem I'm working on, number of branches at each node is varied and workloads at the leaf level are quite small. The first question is what options should I consider for parallel execution on tree. I'm trying to use functions from Array.Parallel module on every node, however, because overheads of parallelism is too big, the parallel version is even slower than the sequential version. I may change array representation to List or PSeq if it is necessary.

The second question is how to control degree of parallelism on those functions. I'm thinking about controlling by depth of tree, number of branches at each node, workload complexity at the leaf level and number of leaves on tree, however, combining them together seems to be complex and unpredictable.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

暗地喜欢 2024-10-29 08:18:04

将遍历与其他处理分开怎么样?也许创建一个工作队列(MailboxProcessor 是一个很好的起点),并在遍历树时将其他工作排入队列以进行后台处理。它不能解决并行遍历问题(这似乎很难适应所有情况),但由于附加处理被降级到后台,它应该会很快。您可以尝试后台工作人员的数量,直到找到良好的并行度。这一切都假设每个节点要完成的工作量并不小。

编辑

这是一些代码。我确信它可以改进。我必须很快地把它敲出来。但这显示了基本概念。它只有一个“后台工作者”,即MailboxProcessor。我将保留更新它以使用多个工作人员的想象力。

type Msg<'a, 'b> =
    | Work of 'a
    | Done of 'b

type MapTransformer(f) =
    let results = ResizeArray()
    let m = MailboxProcessor.Start(fun payload ->
        let rec loop() =
            async {
                let! msg = payload.Receive()
                match msg with
                | Work work -> 
                    results.Add(f work)
                    return! loop()                
                | Done (channel : AsyncReplyChannel<_>) -> 
                    channel.Reply(results :> seq<_>)
            }
        loop())
    member this.Enqueue(item) = m.Post(Work item)
    member this.Results = m.PostAndReply(fun c -> Done c)

let uberMap tree =
    let m = MapTransformer(fun x -> x + 1)
    tree |> List.iter (fun x -> m.Enqueue(x))
    m.Results

uberMap [1; 2; 3]
//outputs [2; 3; 4]

How about separating traversal from any other processing? Perhaps create a work queue (MailboxProcessor is a good starting point) and, as the tree is traversed, enqueue additional work for background processing. It doesn't solve the parallel traversal problem (which seems tricky to get right for all cases) but with additional processing relegated to the background, it should go pretty quickly. You can experiment with the number of background workers until you find a good degree of parallelism. This all assumes the amount of work to be done for each node is non-trivial.

EDIT

Here's some code. I'm sure it can be improved. I had to hammer it out pretty quickly. But this shows the basic concept. It only has one "background worker," i.e., the MailboxProcessor. I'll leave updating it to use multiple workers to the imagination.

type Msg<'a, 'b> =
    | Work of 'a
    | Done of 'b

type MapTransformer(f) =
    let results = ResizeArray()
    let m = MailboxProcessor.Start(fun payload ->
        let rec loop() =
            async {
                let! msg = payload.Receive()
                match msg with
                | Work work -> 
                    results.Add(f work)
                    return! loop()                
                | Done (channel : AsyncReplyChannel<_>) -> 
                    channel.Reply(results :> seq<_>)
            }
        loop())
    member this.Enqueue(item) = m.Post(Work item)
    member this.Results = m.PostAndReply(fun c -> Done c)

let uberMap tree =
    let m = MapTransformer(fun x -> x + 1)
    tree |> List.iter (fun x -> m.Enqueue(x))
    m.Results

uberMap [1; 2; 3]
//outputs [2; 3; 4]
有深☉意 2024-10-29 08:18:04

Array.Parallel 使用System.Threading.Parallel.For。当您调用该函数时,它会尝试为给定任务找到最佳计划。然而,对于典型的递归树算法,这意味着对 Parallel.For 的大量调用,并且您最终可能会得到太多的线程。 (除非 Parallel.For 针对我不知道的用例进行了优化。)
所以我认为如果每个节点的工作负载不是太小的话,丹尼尔的建议是一个好主意。另一种想法是引入相对于树的剩余深度的阈值,如 Stephen Toub 在 此博客条目

Array.Parallel uses System.Threading.Parallel.For. When you call that function, it tries to find an optimal schedule for the given task. However, with a typical recursive tree algorithm, this means a lot of calls to Parallel.For and you probably end up with far too many threads. (Unless Parallel.For is optimized for that use case which I don't know.)
So I think Daniel's suggestion is a good idea if the workload per node is not too small. An alternative idea is to introduce a threshold with respect to the remaining depth of a tree as described by Stephen Toub at the end of this blog entry.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文