Monad Transformer 基准测试的奇怪结果。一个错误?

发布于 2024-12-14 22:52:05 字数 1993 浏览 2 评论 0原文

我做了一些 Criterion 基准测试来估计在 monad 堆栈上运行代码会损失多少性能。结果相当奇怪,我可能在我的基准测试中偶然发现了一些惰性陷阱。

基准测试告诉我,即使不使用 tell,运行 WriterT String IO 也比运行普通 IO 慢 20 倍(!)。奇怪的是,如果我将 WriterTReaderTContT 堆叠起来,速度只会慢 5 倍。这可能是我的基准测试中的一个错误。我在这里做错了什么?

基准测试

{-#LANGUAGE BangPatterns#-}
module Main where
import Criterion.Main
import Control.Monad
import Control.Monad.Writer
import Control.Monad.Reader
import Control.Monad.Cont

process :: Monad m => Int -> m Int
process = foldl (>=>) return (replicate 100000 (\(!x) -> return (x+1)))

test n = process n >> return ()

main = defaultMain [
      bench "Plain"  t0
     ,bench "Writer" t1
     ,bench "Reader" t2
     ,bench "Cont"   t3
     ,bench "RWC"    t4
    ]

t0 = test 1 :: IO ()
t1 = (runWriterT  (test 1:: WriterT String IO ()) >> return ()) :: IO ()
t2 = (runReaderT (test 1:: ReaderT String IO ()) "" >> return ()) :: IO ()
t3 = (runContT   (test 1:: ContT () IO ()) (return) >> return ()) :: IO ()
t4 = ((runWriterT . flip runReaderT "" . flip runContT return $
      (test 1 :: ContT () (ReaderT String (WriterT String IO)) ())) >> return ()) :: IO ()

结果

benchmarking Plain
mean: 1.938814 ms, lb 1.846508 ms, ub 2.052165 ms, ci 0.950
std dev: 519.7248 us, lb 428.4684 us, ub 709.3670 us, ci 0.950

benchmarking Writer
mean: 39.50431 ms, lb 38.25233 ms, ub 40.74437 ms, ci 0.950
std dev: 6.378220 ms, lb 5.738682 ms, ub 7.155760 ms, ci 0.950

benchmarking Reader
mean: 12.52823 ms, lb 12.03947 ms, ub 13.09994 ms, ci 0.950
std dev: 2.706265 ms, lb 2.324519 ms, ub 3.462641 ms, ci 0.950

benchmarking Cont
mean: 8.100272 ms, lb 7.634488 ms, ub 8.633348 ms, ci 0.950
std dev: 2.562829 ms, lb 2.281561 ms, ub 2.878463 ms, ci 0.950

benchmarking RWC
mean: 9.871992 ms, lb 9.436721 ms, ub 10.37302 ms, ci 0.950
std dev: 2.387364 ms, lb 2.136819 ms, ub 2.721750 ms, ci 0.950

I did some Criterion benchmarks to estimate how much performance I lose by running my code over a monad stack. The results were rather curious, and I have probably stumbled upon some laziness pitfall in my benchmark.

The benchmark tells me that running WriterT String IO is 20 times(!) slower than running plain IO, even when not using tell. Weirdly, if I stack WriterT with ReaderT and ContT it is just 5 times slower. This probably is a bug in my benchmark. What am I doing wrong here?

The benchmark

{-#LANGUAGE BangPatterns#-}
module Main where
import Criterion.Main
import Control.Monad
import Control.Monad.Writer
import Control.Monad.Reader
import Control.Monad.Cont

process :: Monad m => Int -> m Int
process = foldl (>=>) return (replicate 100000 (\(!x) -> return (x+1)))

test n = process n >> return ()

main = defaultMain [
      bench "Plain"  t0
     ,bench "Writer" t1
     ,bench "Reader" t2
     ,bench "Cont"   t3
     ,bench "RWC"    t4
    ]

t0 = test 1 :: IO ()
t1 = (runWriterT  (test 1:: WriterT String IO ()) >> return ()) :: IO ()
t2 = (runReaderT (test 1:: ReaderT String IO ()) "" >> return ()) :: IO ()
t3 = (runContT   (test 1:: ContT () IO ()) (return) >> return ()) :: IO ()
t4 = ((runWriterT . flip runReaderT "" . flip runContT return $
      (test 1 :: ContT () (ReaderT String (WriterT String IO)) ())) >> return ()) :: IO ()

The results

benchmarking Plain
mean: 1.938814 ms, lb 1.846508 ms, ub 2.052165 ms, ci 0.950
std dev: 519.7248 us, lb 428.4684 us, ub 709.3670 us, ci 0.950

benchmarking Writer
mean: 39.50431 ms, lb 38.25233 ms, ub 40.74437 ms, ci 0.950
std dev: 6.378220 ms, lb 5.738682 ms, ub 7.155760 ms, ci 0.950

benchmarking Reader
mean: 12.52823 ms, lb 12.03947 ms, ub 13.09994 ms, ci 0.950
std dev: 2.706265 ms, lb 2.324519 ms, ub 3.462641 ms, ci 0.950

benchmarking Cont
mean: 8.100272 ms, lb 7.634488 ms, ub 8.633348 ms, ci 0.950
std dev: 2.562829 ms, lb 2.281561 ms, ub 2.878463 ms, ci 0.950

benchmarking RWC
mean: 9.871992 ms, lb 9.436721 ms, ub 10.37302 ms, ci 0.950
std dev: 2.387364 ms, lb 2.136819 ms, ub 2.721750 ms, ci 0.950

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

枯寂 2024-12-21 22:52:05

正如您所注意到的,惰性写入器 monad 非常慢。正如丹尼尔·费舍尔(Daniel Fischer)建议的那样使用严格版本有很大帮助,但是为什么在大堆栈中使用时它会变得如此快?

为了回答这个问题,我们来看看这些变压器的实现。首先,惰性编写器 monad 转换器。

newtype WriterT w m a = WriterT { runWriterT :: m (a, w) }

instance (Monoid w, Monad m) => Monad (WriterT w m) where
    return a = WriterT $ return (a, mempty)
    m >>= k  = WriterT $ do
        ~(a, w)  <- runWriterT m
        ~(b, w') <- runWriterT (k a)
        return (b, w `mappend` w')

正如您所看到的,这做了很多事情。它运行底层 monad 的操作,进行一些模式匹配并收集写入的值。和你所期望的差不多。严格版本类似,只是没有无可辩驳的(惰性)模式。

newtype ReaderT r m a = ReaderT { runReaderT :: r -> m a }

instance (Monad m) => Monad (ReaderT r m) where
    return   = lift . return
    m >>= k  = ReaderT $ \ r -> do
        a <- runReaderT m r
        runReaderT (k a) r

阅读器变压器稍微精简一些。它分配阅读器环境并调用底层 monad 来执行操作。这里没有什么惊喜。

现在,让我们看看ContT

newtype ContT r m a = ContT { runContT :: (a -> m r) -> m r }

instance Monad (ContT r m) where
    return a = ContT ($ a)
    m >>= k  = ContT $ \c -> runContT m (\a -> runContT (k a) c)

注意到什么不同了吗? 它实际上并不使用底层 monad 中的任何函数!事实上,它甚至不需要 m 是一个 monad。这意味着根本不进行缓慢的模式匹配或附加。仅当您实际尝试从底层 monad 提升任何操作时,ContT 才会使用其绑定运算符。

instance MonadTrans (ContT r) where
    lift m = ContT (m >>=)

因此,由于您实际上并未执行任何特定于编写器的操作,因此 ContT 避免使用 WriterT 中的慢速绑定运算符。这就是为什么将 ContT 放在堆栈顶部会使其速度更快,以及为什么 ContT () IO () 的运行时间与更深层次的运行时间如此相似。堆。

As you've noticed, the lazy writer monad is quite slow. Using the strict version as Daniel Fischer suggests helps a lot, but why does it become so much faster when used in the big stack?

To answer this question, we take a look at the implementation of these transformers. First, the lazy writer monad transformer.

newtype WriterT w m a = WriterT { runWriterT :: m (a, w) }

instance (Monoid w, Monad m) => Monad (WriterT w m) where
    return a = WriterT $ return (a, mempty)
    m >>= k  = WriterT $ do
        ~(a, w)  <- runWriterT m
        ~(b, w') <- runWriterT (k a)
        return (b, w `mappend` w')

As you can see, this does quite a lot. It runs the actions of the underlying monad, does some pattern matching and gathers the written values. Pretty much what you'd expect. The strict version is similar, only without irrefutable (lazy) patterns.

newtype ReaderT r m a = ReaderT { runReaderT :: r -> m a }

instance (Monad m) => Monad (ReaderT r m) where
    return   = lift . return
    m >>= k  = ReaderT $ \ r -> do
        a <- runReaderT m r
        runReaderT (k a) r

The reader transformer is a bit leaner. It distributes the reader environment and calls upon the underlying monad to perform the actions. No surprises here.

Now, let's look at ContT.

newtype ContT r m a = ContT { runContT :: (a -> m r) -> m r }

instance Monad (ContT r m) where
    return a = ContT ($ a)
    m >>= k  = ContT $ \c -> runContT m (\a -> runContT (k a) c)

Notice anything different? It does not actually use any functions from the underlying monad! In fact, it doesn't even require m to be a monad. That means that no slow pattern matching or appends are being done at all. Only when you actually try to lift any actions from the underlying monad does ContT use its bind operator.

instance MonadTrans (ContT r) where
    lift m = ContT (m >>=)

So since you're not actually doing any writer-specific stuff, ContT avoids using the slow bind operator from WriterT. That's why having ContT on top of your stack makes it so much faster, and why the run time of the ContT () IO () is so similar to that of the deeper stack.

软糖 2024-12-21 22:52:05

Part of the extreme slowdown of Writer is that you're using the lazy writer monad, so your bang-pattern doesn't help at all there, cf. the answer to this question for a more detailed explanation (although for State, but it's the same reason here). Changing that to Control.Monad.Writer.Strict reduced the slowdown here from eight-fold to less-than-four-fold. Still the stack is faster, I haven't yet understood why.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文