写“fib”并行运行:-N2 更慢?

发布于 2024-12-29 07:58:16 字数 3368 浏览 2 评论 0原文

我正在学习 Haskell 并尝试编写并行执行的代码,但 Haskell 总是按顺序运行它。当我使用 -N2 运行时标志执行时,比忽略此标志需要更多的时间来执行。

这是代码:

import Control.Parallel
import Control.Parallel.Strategies

fib :: Int -> Int
fib 1 = 1
fib 0 = 1
fib n = fib (n - 1) + fib (n - 2)

fib2 :: Int -> Int
fib2 n = a `par` (b `pseq` (a+b))
    where a = fib n
          b = fib n + 1

fib3 :: Int -> Int
fib3 n = runEval $ do
                a <- rpar (fib n)
                b <- rpar (fib n + 1)
                rseq a
                rseq b
                return (a+b)

main = do putStrLn (show (fib3 40))

我做错了什么?我在基于 Intel core i5 的 Windows 7 和基于 Atom 的 Linux 中尝试了此示例。

这是我的控制台会话的日志:

ghc -rtsopts -threaded -O2 test.hs
[1 of 1] Compiling Main             ( test.hs, test.o )

test +RTS -s
331160283
          64,496 bytes allocated in the heap
           2,024 bytes copied during GC
          42,888 bytes maximum residency (1 sample(s))
          22,648 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0:     0 collections,     0 parallel,  0.00s,  0.00s elapsed
  Generation 1:     1 collections,     0 parallel,  0.00s,  0.00s elapsed

  Parallel GC work balance: nan (0 / 0, ideal 1)

                        MUT time (elapsed)       GC time  (elapsed)
  Task  0 (worker) :    0.00s    (  6.59s)       0.00s    (  0.00s)
  Task  1 (worker) :    0.00s    (  0.00s)       0.00s    (  0.00s)
  Task  2 (bound)  :    6.33s    (  6.59s)       0.00s    (  0.00s)

  SPARKS: 2 (0 converted, 0 pruned)

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    6.33s  (  6.59s elapsed)
  GC    time    0.00s  (  0.00s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    6.33s  (  6.59s elapsed)

  %GC time       0.0%  (0.0% elapsed)

  Alloc rate    10,191 bytes per MUT second

  Productivity 100.0% of total user, 96.0% of total elapsed

gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync_large_objects: 0
gen[1].sync_large_objects: 0


test +RTS -N2 -s 
331160283
          72,688 bytes allocated in the heap
           5,644 bytes copied during GC
          28,300 bytes maximum residency (1 sample(s))
          24,948 bytes maximum slop
               2 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0:     1 collections,     0 parallel,  0.00s,  0.00s elapsed
  Generation 1:     1 collections,     1 parallel,  0.00s,  0.01s elapsed

  Parallel GC work balance: 1.51 (937 / 621, ideal 2)

                        MUT time (elapsed)       GC time  (elapsed)
  Task  0 (worker) :    0.00s    (  9.29s)       0.00s    (  0.00s)
  Task  1 (worker) :    4.53s    (  9.29s)       0.00s    (  0.00s)
  Task  2 (bound)  :    5.84s    (  9.29s)       0.00s    (  0.01s)
  Task  3 (worker) :    0.00s    (  9.29s)       0.00s    (  0.00s)

  SPARKS: 2 (1 converted, 0 pruned)

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time   10.38s  (  9.29s elapsed)
  GC    time    0.00s  (  0.01s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time   10.38s  (  9.30s elapsed)

  %GC time       0.0%  (0.1% elapsed)

  Alloc rate    7,006 bytes per MUT second

  Productivity 100.0% of total user, 111.6% of total elapsed

gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync_large_objects: 0
gen[1].sync_large_objects: 0

I'm learning Haskell and trying write code to execute in parallel, but Haskell always runs it sequentially. And when I execute with the -N2 runtime flag it take more time to execute than if I omit this flag.

Here is code:

import Control.Parallel
import Control.Parallel.Strategies

fib :: Int -> Int
fib 1 = 1
fib 0 = 1
fib n = fib (n - 1) + fib (n - 2)

fib2 :: Int -> Int
fib2 n = a `par` (b `pseq` (a+b))
    where a = fib n
          b = fib n + 1

fib3 :: Int -> Int
fib3 n = runEval $ do
                a <- rpar (fib n)
                b <- rpar (fib n + 1)
                rseq a
                rseq b
                return (a+b)

main = do putStrLn (show (fib3 40))

What did I do wrong? I tried this sample in Windows 7 on Intel core i5 and in Linux on Atom.

Here is log from my console session:

ghc -rtsopts -threaded -O2 test.hs
[1 of 1] Compiling Main             ( test.hs, test.o )

test +RTS -s
331160283
          64,496 bytes allocated in the heap
           2,024 bytes copied during GC
          42,888 bytes maximum residency (1 sample(s))
          22,648 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0:     0 collections,     0 parallel,  0.00s,  0.00s elapsed
  Generation 1:     1 collections,     0 parallel,  0.00s,  0.00s elapsed

  Parallel GC work balance: nan (0 / 0, ideal 1)

                        MUT time (elapsed)       GC time  (elapsed)
  Task  0 (worker) :    0.00s    (  6.59s)       0.00s    (  0.00s)
  Task  1 (worker) :    0.00s    (  0.00s)       0.00s    (  0.00s)
  Task  2 (bound)  :    6.33s    (  6.59s)       0.00s    (  0.00s)

  SPARKS: 2 (0 converted, 0 pruned)

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    6.33s  (  6.59s elapsed)
  GC    time    0.00s  (  0.00s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    6.33s  (  6.59s elapsed)

  %GC time       0.0%  (0.0% elapsed)

  Alloc rate    10,191 bytes per MUT second

  Productivity 100.0% of total user, 96.0% of total elapsed

gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync_large_objects: 0
gen[1].sync_large_objects: 0


test +RTS -N2 -s 
331160283
          72,688 bytes allocated in the heap
           5,644 bytes copied during GC
          28,300 bytes maximum residency (1 sample(s))
          24,948 bytes maximum slop
               2 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0:     1 collections,     0 parallel,  0.00s,  0.00s elapsed
  Generation 1:     1 collections,     1 parallel,  0.00s,  0.01s elapsed

  Parallel GC work balance: 1.51 (937 / 621, ideal 2)

                        MUT time (elapsed)       GC time  (elapsed)
  Task  0 (worker) :    0.00s    (  9.29s)       0.00s    (  0.00s)
  Task  1 (worker) :    4.53s    (  9.29s)       0.00s    (  0.00s)
  Task  2 (bound)  :    5.84s    (  9.29s)       0.00s    (  0.01s)
  Task  3 (worker) :    0.00s    (  9.29s)       0.00s    (  0.00s)

  SPARKS: 2 (1 converted, 0 pruned)

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time   10.38s  (  9.29s elapsed)
  GC    time    0.00s  (  0.01s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time   10.38s  (  9.30s elapsed)

  %GC time       0.0%  (0.1% elapsed)

  Alloc rate    7,006 bytes per MUT second

  Productivity 100.0% of total user, 111.6% of total elapsed

gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync_large_objects: 0
gen[1].sync_large_objects: 0

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

时光磨忆 2025-01-05 07:58:16

我认为答案是“GHC 将优化 fib 函数,使其不进行分配,并且
不进行分配的计算会给 RTS 带来问题,因为
调度程序永远不会运行并进行负载平衡(即
并行性所必需的)
”,正如 Simon 在讨论中所写的那样我还发现了很好的教程

I think answer is that "GHC will optimise the fib function so that it does no allocation, and
computations that do no allocation cause problems for the RTS because
the scheduler never gets to run and do load-balancing (which is
necessary for parallelism)
" as wrote Simon in this discussion group. Also I found good tutorial.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文