Haskell 嵌套向量并行策略
类似于此相关问题 ,我想在向量上执行并行映射,但就我而言,我有一个嵌套向量,并且我似乎无法获得正确的评估。
这是我到目前为止所拥有的:
import qualified Data.Vector as V
import qualified Data.Vector.Unboxed as U
import Data.Vector.Strategies
import Control.DeepSeq
main = do
let res = genVVec 200 `using` parVector 2
print res
genUVec :: Int -> U.Vector Int
genUVec n = U.map (ack 2) $ U.enumFromN n 75
genVVec :: Int -> V.Vector (U.Vector Int)
genVVec n = V.map genUVec $ V.enumFromN 0 n
ack :: Int -> Int -> Int
ack 0 n = n+1
ack m 0 = ack (m-1) 1
ack m n = ack (m-1) (ack m (n-1))
instance (NFData a, U.Unbox a) => NFData (U.Vector a) where
rnf = rnf . U.toList
给出:
$ ./vectorPar +RTS -N8 -s >/dev/null
SPARKS: 200 (17 converted, 183 pruned)
Total time 1.37s ( 1.30s elapsed)
$ ./vectorPar +RTS -s >/dev/null
SPARKS: 200 (0 converted, 200 pruned)
Total time 1.25s ( 1.26s elapsed)
我还尝试修改 parVector
函数 直接在向量策略中,但我的尝试是笨拙且无效的。
我意识到 REPA 是为嵌套工作负载而设计的,但这似乎是一个足够简单的问题,我宁愿不必重写大量代码。
Similar to this related question, I would like to perform a parallel map on a Vector, but in my case I have a nested Vector, and I can't seem to get the evaluation correct.
Here is what I have so far:
import qualified Data.Vector as V
import qualified Data.Vector.Unboxed as U
import Data.Vector.Strategies
import Control.DeepSeq
main = do
let res = genVVec 200 `using` parVector 2
print res
genUVec :: Int -> U.Vector Int
genUVec n = U.map (ack 2) $ U.enumFromN n 75
genVVec :: Int -> V.Vector (U.Vector Int)
genVVec n = V.map genUVec $ V.enumFromN 0 n
ack :: Int -> Int -> Int
ack 0 n = n+1
ack m 0 = ack (m-1) 1
ack m n = ack (m-1) (ack m (n-1))
instance (NFData a, U.Unbox a) => NFData (U.Vector a) where
rnf = rnf . U.toList
gives:
$ ./vectorPar +RTS -N8 -s >/dev/null
SPARKS: 200 (17 converted, 183 pruned)
Total time 1.37s ( 1.30s elapsed)
$ ./vectorPar +RTS -s >/dev/null
SPARKS: 200 (0 converted, 200 pruned)
Total time 1.25s ( 1.26s elapsed)
I have also tried modifying the parVector
function in vector-strategies directly, but my attempts are clumsy and ineffective.
I realize REPA was designed for nested workloads, but this seems a simple enough problem, and I'd rather not have to rewrite a lot of code.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
注意:这里是向量策略的有罪作者(这是一个非常小的标题,因为这只是一个我认为其他人会发现有用的破解函数)。
您认为
parVector
是错误的,因为它允许在使用之前对火花进行 GC 处理,这似乎是正确的。 SimonM 的建议意味着我必须精确地做我试图避免的事情,以一定的成本构建一个新的向量来代替旧的向量。知道这是必要的,没有理由不将parVector
更改为更简单的定义:请注意,John L 给出的修复之所以有效,是因为它通过在收集之前强制进行计算来“击败”收集器。发生。
我将更改矢量策略库,因此这是不必要的 - 使您的原始代码正常工作。不幸的是,这将产生上述构建新向量的成本(通常是最小的)。
Note: Guilty author of vector-strategies here (which is a very small title, seeing as this was just a hacked up function I figured others would find useful).
Your observation that
parVector
is wrong in that it allows the sparks to be GCed prior to use seems to be correct. The advice by SimonM means I must do precisely what I was trying to avoid, construct a new vector, at some cost, in place of the old one. Knowing this is necessary, there is little reason not to changeparVector
to the much simpler definition of:Notice the fix given by John L only works because it "beats" the collector by forcing the computations before collection would occur.
I'll be changing the vector-strategies library so this is unnecessary - making your original code work fine. Unfortunately, this will incur the above-mentioned cost of constructing a new Vector (usually minimal).
问题似乎是
parVector
不强制评估向量的元素。每个元素仍然是一个重击,并且在矢量被消耗(通过打印)之前不会激发任何东西,这对于火花来工作来说已经太晚了。您可以通过将parVector
策略与rdeepseq
组合起来来强制评估每个元素。我还更改了您的 NFData (U.Vector a) 实例。由于
U.Vector
已拆箱,对 WHNF 的评估就足够了,并且通过列表转换强制每个元素是浪费的。事实上,如果您愿意的话,rnf
的默认定义可以正常工作。通过这两项更改,我得到以下结果
,并且运行时间减少了近 50%。我还将向量块大小调整为 20,但结果与块大小 2 非常相似。
The problem appears to be that
parVector
doesn't force evaluation of the elements of the vector. Each element remains a thunk and nothing is sparked until the vector is consumed (by being printed), which is too late for the sparks to do work. You can force evaluation of each element by composing theparVector
strategy withrdeepseq
.I also changed your
NFData (U.Vector a)
instance. Since aU.Vector
is unboxed, evaluation to WHNF is sufficient, and forcing each element via the list conversion is wasteful. In fact the default definition forrnf
works fine if you like.With these two changes, I get the following
and the runtime has been reduced by nearly 50%. I also adjusted the vector chunk size to 20, but the result is very similar to a chunk size of 2.