写“fib”并行运行:-N2 更慢?
我正在学习 Haskell 并尝试编写并行执行的代码,但 Haskell 总是按顺序运行它。当我使用 -N2
运行时标志执行时,比忽略此标志需要更多的时间来执行。
这是代码:
import Control.Parallel
import Control.Parallel.Strategies
fib :: Int -> Int
fib 1 = 1
fib 0 = 1
fib n = fib (n - 1) + fib (n - 2)
fib2 :: Int -> Int
fib2 n = a `par` (b `pseq` (a+b))
where a = fib n
b = fib n + 1
fib3 :: Int -> Int
fib3 n = runEval $ do
a <- rpar (fib n)
b <- rpar (fib n + 1)
rseq a
rseq b
return (a+b)
main = do putStrLn (show (fib3 40))
我做错了什么?我在基于 Intel core i5 的 Windows 7 和基于 Atom 的 Linux 中尝试了此示例。
这是我的控制台会话的日志:
ghc -rtsopts -threaded -O2 test.hs
[1 of 1] Compiling Main ( test.hs, test.o )
test +RTS -s
331160283
64,496 bytes allocated in the heap
2,024 bytes copied during GC
42,888 bytes maximum residency (1 sample(s))
22,648 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)
Generation 0: 0 collections, 0 parallel, 0.00s, 0.00s elapsed
Generation 1: 1 collections, 0 parallel, 0.00s, 0.00s elapsed
Parallel GC work balance: nan (0 / 0, ideal 1)
MUT time (elapsed) GC time (elapsed)
Task 0 (worker) : 0.00s ( 6.59s) 0.00s ( 0.00s)
Task 1 (worker) : 0.00s ( 0.00s) 0.00s ( 0.00s)
Task 2 (bound) : 6.33s ( 6.59s) 0.00s ( 0.00s)
SPARKS: 2 (0 converted, 0 pruned)
INIT time 0.00s ( 0.00s elapsed)
MUT time 6.33s ( 6.59s elapsed)
GC time 0.00s ( 0.00s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 6.33s ( 6.59s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 10,191 bytes per MUT second
Productivity 100.0% of total user, 96.0% of total elapsed
gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync_large_objects: 0
gen[1].sync_large_objects: 0
test +RTS -N2 -s
331160283
72,688 bytes allocated in the heap
5,644 bytes copied during GC
28,300 bytes maximum residency (1 sample(s))
24,948 bytes maximum slop
2 MB total memory in use (0 MB lost due to fragmentation)
Generation 0: 1 collections, 0 parallel, 0.00s, 0.00s elapsed
Generation 1: 1 collections, 1 parallel, 0.00s, 0.01s elapsed
Parallel GC work balance: 1.51 (937 / 621, ideal 2)
MUT time (elapsed) GC time (elapsed)
Task 0 (worker) : 0.00s ( 9.29s) 0.00s ( 0.00s)
Task 1 (worker) : 4.53s ( 9.29s) 0.00s ( 0.00s)
Task 2 (bound) : 5.84s ( 9.29s) 0.00s ( 0.01s)
Task 3 (worker) : 0.00s ( 9.29s) 0.00s ( 0.00s)
SPARKS: 2 (1 converted, 0 pruned)
INIT time 0.00s ( 0.00s elapsed)
MUT time 10.38s ( 9.29s elapsed)
GC time 0.00s ( 0.01s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 10.38s ( 9.30s elapsed)
%GC time 0.0% (0.1% elapsed)
Alloc rate 7,006 bytes per MUT second
Productivity 100.0% of total user, 111.6% of total elapsed
gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync_large_objects: 0
gen[1].sync_large_objects: 0
I'm learning Haskell and trying write code to execute in parallel, but Haskell always runs it sequentially. And when I execute with the -N2
runtime flag it take more time to execute than if I omit this flag.
Here is code:
import Control.Parallel
import Control.Parallel.Strategies
fib :: Int -> Int
fib 1 = 1
fib 0 = 1
fib n = fib (n - 1) + fib (n - 2)
fib2 :: Int -> Int
fib2 n = a `par` (b `pseq` (a+b))
where a = fib n
b = fib n + 1
fib3 :: Int -> Int
fib3 n = runEval $ do
a <- rpar (fib n)
b <- rpar (fib n + 1)
rseq a
rseq b
return (a+b)
main = do putStrLn (show (fib3 40))
What did I do wrong? I tried this sample in Windows 7 on Intel core i5 and in Linux on Atom.
Here is log from my console session:
ghc -rtsopts -threaded -O2 test.hs
[1 of 1] Compiling Main ( test.hs, test.o )
test +RTS -s
331160283
64,496 bytes allocated in the heap
2,024 bytes copied during GC
42,888 bytes maximum residency (1 sample(s))
22,648 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)
Generation 0: 0 collections, 0 parallel, 0.00s, 0.00s elapsed
Generation 1: 1 collections, 0 parallel, 0.00s, 0.00s elapsed
Parallel GC work balance: nan (0 / 0, ideal 1)
MUT time (elapsed) GC time (elapsed)
Task 0 (worker) : 0.00s ( 6.59s) 0.00s ( 0.00s)
Task 1 (worker) : 0.00s ( 0.00s) 0.00s ( 0.00s)
Task 2 (bound) : 6.33s ( 6.59s) 0.00s ( 0.00s)
SPARKS: 2 (0 converted, 0 pruned)
INIT time 0.00s ( 0.00s elapsed)
MUT time 6.33s ( 6.59s elapsed)
GC time 0.00s ( 0.00s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 6.33s ( 6.59s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 10,191 bytes per MUT second
Productivity 100.0% of total user, 96.0% of total elapsed
gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync_large_objects: 0
gen[1].sync_large_objects: 0
test +RTS -N2 -s
331160283
72,688 bytes allocated in the heap
5,644 bytes copied during GC
28,300 bytes maximum residency (1 sample(s))
24,948 bytes maximum slop
2 MB total memory in use (0 MB lost due to fragmentation)
Generation 0: 1 collections, 0 parallel, 0.00s, 0.00s elapsed
Generation 1: 1 collections, 1 parallel, 0.00s, 0.01s elapsed
Parallel GC work balance: 1.51 (937 / 621, ideal 2)
MUT time (elapsed) GC time (elapsed)
Task 0 (worker) : 0.00s ( 9.29s) 0.00s ( 0.00s)
Task 1 (worker) : 4.53s ( 9.29s) 0.00s ( 0.00s)
Task 2 (bound) : 5.84s ( 9.29s) 0.00s ( 0.01s)
Task 3 (worker) : 0.00s ( 9.29s) 0.00s ( 0.00s)
SPARKS: 2 (1 converted, 0 pruned)
INIT time 0.00s ( 0.00s elapsed)
MUT time 10.38s ( 9.29s elapsed)
GC time 0.00s ( 0.01s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 10.38s ( 9.30s elapsed)
%GC time 0.0% (0.1% elapsed)
Alloc rate 7,006 bytes per MUT second
Productivity 100.0% of total user, 111.6% of total elapsed
gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync_large_objects: 0
gen[1].sync_large_objects: 0
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为答案是“GHC 将优化 fib 函数,使其不进行分配,并且
不进行分配的计算会给 RTS 带来问题,因为
调度程序永远不会运行并进行负载平衡(即
并行性所必需的)”,正如 Simon 在讨论中所写的那样我还发现了很好的教程。
I think answer is that "GHC will optimise the fib function so that it does no allocation, and
computations that do no allocation cause problems for the RTS because
the scheduler never gets to run and do load-balancing (which is
necessary for parallelism)" as wrote Simon in this discussion group. Also I found good tutorial.