Haskell 嵌套向量并行策略

发布于 2024-11-25 05:28:24 字数 1358 浏览 0 评论 0原文

类似于此相关问题 ,我想在向量上执行并行映射,但就我而言,我有一个嵌套向量,并且我似乎无法获得正确的评估。

这是我到目前为止所拥有的:

import qualified Data.Vector as V
import qualified Data.Vector.Unboxed as U
import Data.Vector.Strategies 
import Control.DeepSeq

main = do
  let res = genVVec 200 `using` parVector 2
  print res

genUVec :: Int -> U.Vector Int
genUVec n = U.map (ack 2) $ U.enumFromN n 75

genVVec :: Int -> V.Vector (U.Vector Int)
genVVec n = V.map genUVec $ V.enumFromN 0 n

ack :: Int -> Int -> Int
ack 0 n = n+1
ack m 0 = ack (m-1) 1
ack m n = ack (m-1) (ack m (n-1))

instance (NFData a, U.Unbox a) => NFData (U.Vector a) where
  rnf = rnf . U.toList

给出:

$ ./vectorPar +RTS -N8 -s >/dev/null
   SPARKS: 200 (17 converted, 183 pruned)
   Total time    1.37s  (  1.30s elapsed)
$ ./vectorPar +RTS -s >/dev/null
   SPARKS: 200 (0 converted, 200 pruned)
   Total time    1.25s  (  1.26s elapsed)

我还尝试修改 parVector 函数 直接在向量策略中,但我的尝试是笨拙且无效的。

我意识到 REPA 是为嵌套工作负载而设计的,但这似乎是一个足够简单的问题,我宁愿不必重写大量代码。

Similar to this related question, I would like to perform a parallel map on a Vector, but in my case I have a nested Vector, and I can't seem to get the evaluation correct.

Here is what I have so far:

import qualified Data.Vector as V
import qualified Data.Vector.Unboxed as U
import Data.Vector.Strategies 
import Control.DeepSeq

main = do
  let res = genVVec 200 `using` parVector 2
  print res

genUVec :: Int -> U.Vector Int
genUVec n = U.map (ack 2) $ U.enumFromN n 75

genVVec :: Int -> V.Vector (U.Vector Int)
genVVec n = V.map genUVec $ V.enumFromN 0 n

ack :: Int -> Int -> Int
ack 0 n = n+1
ack m 0 = ack (m-1) 1
ack m n = ack (m-1) (ack m (n-1))

instance (NFData a, U.Unbox a) => NFData (U.Vector a) where
  rnf = rnf . U.toList

gives:

$ ./vectorPar +RTS -N8 -s >/dev/null
   SPARKS: 200 (17 converted, 183 pruned)
   Total time    1.37s  (  1.30s elapsed)
$ ./vectorPar +RTS -s >/dev/null
   SPARKS: 200 (0 converted, 200 pruned)
   Total time    1.25s  (  1.26s elapsed)

I have also tried modifying the parVector function in vector-strategies directly, but my attempts are clumsy and ineffective.

I realize REPA was designed for nested workloads, but this seems a simple enough problem, and I'd rather not have to rewrite a lot of code.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

樱花细雨 2024-12-02 05:28:24

注意:这里是向量策略的有罪作者(这是一个非常小的标题,因为这只是一个我认为其他人会发现有用的破解函数)。

您认为 parVector 是错误的,因为它允许在使用之前对火花进行 GC 处理,这似乎是正确的。 SimonM 的建议意味着我必须精确地做我试图避免的事情,以一定的成本构建一个新的向量来代替旧的向量。知道这是必要的,没有理由不将 parVector 更改为更简单的定义:

parVector2 :: NFData a => Int -> Strategy (V.Vector a)
parVector2 n = liftM V.fromList . parListChunk n rdeepseq . V.toList

请注意,John L 给出的修复之所以有效,是因为它通过在收集之前强制进行计算来“击败”收集器。发生。

我将更改矢量策略库,因此这是不必要的 - 使您的原始代码正常工作。不幸的是,这将产生上述构建新向量的成本(通常是最小的)。

Note: Guilty author of vector-strategies here (which is a very small title, seeing as this was just a hacked up function I figured others would find useful).

Your observation that parVector is wrong in that it allows the sparks to be GCed prior to use seems to be correct. The advice by SimonM means I must do precisely what I was trying to avoid, construct a new vector, at some cost, in place of the old one. Knowing this is necessary, there is little reason not to change parVector to the much simpler definition of:

parVector2 :: NFData a => Int -> Strategy (V.Vector a)
parVector2 n = liftM V.fromList . parListChunk n rdeepseq . V.toList

Notice the fix given by John L only works because it "beats" the collector by forcing the computations before collection would occur.

I'll be changing the vector-strategies library so this is unnecessary - making your original code work fine. Unfortunately, this will incur the above-mentioned cost of constructing a new Vector (usually minimal).

海拔太高太耀眼 2024-12-02 05:28:24

问题似乎是 parVector 不强制评估向量的元素。每个元素仍然是一个重击,并且在矢量被消耗(通过打印)之前不会激发任何东西,这对于火花来工作来说已经太晚了。您可以通过将 parVector 策略与 rdeepseq 组合起来来强制评估每个元素。

import qualified Data.Vector as V
import qualified Data.Vector.Unboxed as U
import Data.Vector.Strategies
import Control.DeepSeq
import Control.Parallel.Strategies

main = do
  let res = genVVec 200 `using` (rdeepseq `dot` parVector 20)
  print res

genUVec :: Int -> U.Vector Int
genUVec n = U.map (ack 2) $ U.enumFromN n 75

genVVec :: Int -> V.Vector (U.Vector Int)
genVVec n = V.map genUVec $ V.enumFromN 0 n

ack :: Int -> Int -> Int
ack 0 n = n+1
ack m 0 = ack (m-1) 1
ack m n = ack (m-1) (ack m (n-1))

instance (NFData a, U.Unbox a) => NFData (U.Vector a) where
  rnf vec = seq vec ()

instance (NFData a) => NFData (V.Vector a) where
  rnf = rnf . V.toList

我还更改了您的 NFData (U.Vector a) 实例。由于 U.Vector 已拆箱,对 WHNF 的评估就足够了,并且通过列表转换强制每个元素是浪费的。事实上,如果您愿意的话,rnf 的默认定义可以正常工作。

通过这两项更改,我得到以下结果

SPARKS: 200 (200 converted, 0 pruned)

,并且运行时间减少了近 50%。我还将向量块大小调整为 20,但结果与块大小 2 非常相似。

The problem appears to be that parVector doesn't force evaluation of the elements of the vector. Each element remains a thunk and nothing is sparked until the vector is consumed (by being printed), which is too late for the sparks to do work. You can force evaluation of each element by composing the parVector strategy with rdeepseq.

import qualified Data.Vector as V
import qualified Data.Vector.Unboxed as U
import Data.Vector.Strategies
import Control.DeepSeq
import Control.Parallel.Strategies

main = do
  let res = genVVec 200 `using` (rdeepseq `dot` parVector 20)
  print res

genUVec :: Int -> U.Vector Int
genUVec n = U.map (ack 2) $ U.enumFromN n 75

genVVec :: Int -> V.Vector (U.Vector Int)
genVVec n = V.map genUVec $ V.enumFromN 0 n

ack :: Int -> Int -> Int
ack 0 n = n+1
ack m 0 = ack (m-1) 1
ack m n = ack (m-1) (ack m (n-1))

instance (NFData a, U.Unbox a) => NFData (U.Vector a) where
  rnf vec = seq vec ()

instance (NFData a) => NFData (V.Vector a) where
  rnf = rnf . V.toList

I also changed your NFData (U.Vector a) instance. Since a U.Vector is unboxed, evaluation to WHNF is sufficient, and forcing each element via the list conversion is wasteful. In fact the default definition for rnf works fine if you like.

With these two changes, I get the following

SPARKS: 200 (200 converted, 0 pruned)

and the runtime has been reduced by nearly 50%. I also adjusted the vector chunk size to 20, but the result is very similar to a chunk size of 2.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文