为什么将数据重构为newtype加快了我的Haskell程序?

发布于 2025-02-12 12:12:53 字数 2640 浏览 1 评论 0原文

我有一个程序,该程序遍历一个表达式树,该表可以对概率分布进行代数,以取样或计算结果分布。

我有两个实现分布的实现:一个(computeDistibution)可以与Monad Transformers和一个(SimpleDistribution)很好地重复使用,其中我可以手工凝结所有内容。我不想手工结合所有内容,因为那将是采样和计算代码之间的代码重复。

我也有两个数据表示:

type Measure a = [(a, Rational)]
-- data Distribution a = Distribution (Measure a) deriving Show
newtype Distribution a = Distribution (Measure a) deriving Show

当我使用可重复使用的代码的数据版本时,计算20d2的分布(ghc -o3 program.hs; time ./ program 20>/dev;/dev /null)大约需要一秒钟,看起来 Way 太长了。在您自己的危险中选择n的更高值。

当我使用手动建立的代码,或者使用newType用来实现的表示,计算20d2(time ./ Program 20 S>/dev/dev/null)眼睛的眨眼。

为什么?

我怎么知道为什么?

我对Haskell如何执行的了解几乎是零。我收集的是一张与程序基本相同形状的Thunk图的图,但这就是我所知道的。

我用newType Distribution的表示形式与MEATER的表示相同,即,它只是一个列表,而使用数据版本每个distribution有点像单场记录,除了指向包含列表的指针,因此data版本必须执行更多分配。这是真的吗?如果是真的,这足以解释性能差异吗?

我是与Monad Transformer堆栈合作的新手。考虑统一simperedistribution中 - 他们是否与Walktree基于基于的实现相同?我该怎么说?

这是我的程序。请注意,统一n对应于滚动的n面模具(如果单位性质令人惊讶)。

更新:基于评论,我通过删除所有不助长性能差距的所有内容来简化了我的程序。我进行了两个语义上的更改:概率现在是不合规的,所有概率和错误和错误,简化步骤消失了。但是我程序的基本形状仍然存在。 (请参阅非简化程序的问题编辑历史记录。)

更新2 :我进行了进一步的简化,将distribution降低到列表中,带有一个小扭曲,将所有内容删除到使用概率并缩短名称。当使用数据时,我仍然会观察到较大的性能差异,但newType不观察到。

import Control.Monad (liftM2)
import Control.Monad.Trans (lift)
import Control.Monad.Reader (ReaderT, runReaderT)
import System.Environment (getArgs)
import Text.Read (readMaybe)

main = do
  args <- getArgs
  let dieCount = case map readMaybe args of Just n : _ -> n; _ -> 10
  let f = if ["s"] == (take 1 $ drop 1 $ args) then fast else slow
  print $ f dieCount

fast, slow :: Int -> P Integer
fast n = walkTree n
slow n = walkTree n `runReaderT` ()

walkTree 0 = uniform
walkTree n = liftM2 (+) (walkTree 0) (walkTree $ n - 1)

data P a = P [a] deriving Show
-- newtype P a = P [a] deriving Show

class Monad m => MonadP m where uniform :: m Integer
instance MonadP P where uniform = P [1, 1]
instance MonadP p => MonadP (ReaderT env p) where uniform = lift uniform

instance Functor P where fmap f (P pxs) = P $ fmap f pxs

instance Applicative P where
  pure x = P [x]
  (P pfs) <*> (P pxs) = P $ pfs <*> pxs

instance Monad P where
  (P pxs) >>= f = P $ do
    x <- pxs
    case f x of P fxs -> fxs

I have a program which traverses an expression tree that does algebra on probability distributions, either sampling or computing the resulting distribution.

I have two implementations computing the distribution: one (computeDistribution) nicely reusable with monad transformers and one (simpleDistribution) where I concretize everything by hand. I would like to not concretize everything by hand, since that would be code duplication between the sampling and computing code.

I also have two data representations:

type Measure a = [(a, Rational)]
-- data Distribution a = Distribution (Measure a) deriving Show
newtype Distribution a = Distribution (Measure a) deriving Show

When I use the data version with the reusable code, computing the distribution of 20d2 (ghc -O3 program.hs; time ./program 20 > /dev/null) takes about one second, which seems way too long. Pick higher values of n at your own peril.

When I use the hand-concretized code, or I use the newtype representation with either implementation, computing 20d2 (time ./program 20 s > /dev/null) takes the blink of an eye.

Why?

How can I find out why?

My knowledge of how Haskell is executed is almost nil. I gather there's a graph of thunks in basically the same shape as the program, but that's about all I know.

I figure with newtype the representation of Distribution is the same as that of Measure, i.e. it's just a list, whereas with the data version each Distribution is kinda' like a single-field record, except with a pointer to the contained list, and so the data version has to perform more allocations. Is this true? If true, is this enough to explain the performance difference?

I'm new to working with monad transformer stacks. Consider the Let and Uniform cases in simpleDistribution — do they do the same as the walkTree-based implementation? How do I tell?

Here's my program. Note that Uniform n corresponds to rolling an n-sided die (in case the unary-ness was surprising).

Update: based on comments I simplified my program by removing everything not contributing to the performance gap. I made two semantic changes: probabilities are now denormalized and all wonky and wrong, and the simplification step is gone. But the essential shape of my program is still there. (See question edit history for the non-simplified program.)

Update 2: I made further simplifications, reducing Distribution down to the list monad with a small twist, removing everything to do with probabilities, and shortening the names. I still observe large performance differences when using data but not newtype.

import Control.Monad (liftM2)
import Control.Monad.Trans (lift)
import Control.Monad.Reader (ReaderT, runReaderT)
import System.Environment (getArgs)
import Text.Read (readMaybe)

main = do
  args <- getArgs
  let dieCount = case map readMaybe args of Just n : _ -> n; _ -> 10
  let f = if ["s"] == (take 1 $ drop 1 $ args) then fast else slow
  print $ f dieCount

fast, slow :: Int -> P Integer
fast n = walkTree n
slow n = walkTree n `runReaderT` ()

walkTree 0 = uniform
walkTree n = liftM2 (+) (walkTree 0) (walkTree $ n - 1)

data P a = P [a] deriving Show
-- newtype P a = P [a] deriving Show

class Monad m => MonadP m where uniform :: m Integer
instance MonadP P where uniform = P [1, 1]
instance MonadP p => MonadP (ReaderT env p) where uniform = lift uniform

instance Functor P where fmap f (P pxs) = P $ fmap f pxs

instance Applicative P where
  pure x = P [x]
  (P pfs) <*> (P pxs) = P $ pfs <*> pxs

instance Monad P where
  (P pxs) >>= f = P $ do
    x <- pxs
    case f x of P fxs -> fxs

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

忘东忘西忘不掉你 2025-02-19 12:12:53

我如何找出为什么?

通常,这很难。

极端的方法是查看核心代码(您可以通过使用-ddump-simpl来运行GHC来生成。这可能很快就会变得复杂,这基本上是一种全新的语言。您的程序已经足够大,我很难从核心转储中学到很多东西。

找出为什么只是继续使用GHC并提出问题并了解GHC优化的另一种方法,直到您识别某些模式为止。

为什么?

简而言之,我相信这是由于列表融合而造成的。

注意:我不确定这个答案是正确的,并且比我现在愿意投入的时间需要更多的时间/工作来验证。也就是说,这符合证据。

首先,我们可以检查您看到的这种放缓是由于在o0中运行而不会触发GHC优化的结果,即没有优化。在此模式下,这两个Distribution表示形式都会导致大约相同(非常长的)运行时。这使我相信不是数据表示本质上是问题,而是使用newType版本触发的优化,而不是data版本。

当GHC以-O1或更高版本运行时,它会将某些重写规则与 Fuse 不同的折叠和列表的地图合并在一起,以便不需要分配中间值。 (请参阅 https://markkarpov.com/tutorial/ghc-optimization/ghc-optimization/ghc-optimization-com- and-fusion.html#fusion 对于这个概念和 https://stackoverflow.com/a/a/38910170/14802384 另外还与重新创造的所有重新创作的规则联系在一起base。)由于computedistibution基本上是只是一堆列表操作(本质上都是折叠的),这些可能会发射。

关键是,使用newType Distribution的表示,NewType包装器在编译过程中被删除,并允许列表操作融合。但是,使用数据表示,包装器不会被删除,并且重写规则不会触发。

因此,我将提出一个未经证实的主张:如果您希望您的Data表示形式与newtype一个一样快,则需要设置类似于该规则的重写规则对于列表折叠,但通过Distribution类型来工作。这可能涉及编写自己的特殊折叠功能,然后重写您的函数/应用/单调实例以使用它们。

How can I find out why?

This is, in general, hard.

The extreme way to do it is to look at the core code (which you can produce by running GHC with -ddump-simpl). This can get complicated really quickly, and it's basically a whole new language to learn. Your program is already big enough that I had trouble learning much from the core dump.

The other way to find out why is to just keep using GHC and asking questions and learning about GHC optimizations until you recognize certain patterns.

Why?

In short, I believe it's due to list fusion.

NOTE: I don't know for sure that this answer is correct, and it would take more time/work to verify than I'm willing to put in right now. That said, it fits the evidence.

First off, we can check whether this slowdown you're seeing is a result of something truly fundamental vs a GHC optimization triggering or not by running in O0, that is, without optimizations. In this mode, both Distribution representations result in about the same (excruciatingly long) runtime. This leads me to believe that it's not the data representation that is inherently the problem but rather there's an optimization that's triggered with the newtype version that isn't with the data version.

When GHC is run in -O1 or higher, it engages certain rewrite rules to fuse different folds and maps of lists together so that it doesn't need to allocate intermediate values. (See https://markkarpov.com/tutorial/ghc-optimization-and-fusion.html#fusion for a decent tutorial on this concept as well as https://stackoverflow.com/a/38910170/14802384 which additionally has a link to a gist with all of the rewrite rules in base.) Since computeDistribution is basically just a bunch of list manipulations (which are all essentially folds), there is the potential for these to fire.

The key is that with the newtype representation of Distribution, the newtype wrapper is erased during compilation, and the list operations are allowed to fuse. However, with the data representation, the wrappers are not erased, and the rewrite rules do not fire.

Therefore, I will make an unsubstantiated claim: If you want your data representation to be as fast as the newtype one, you will need to set up rewrite rules similar to the ones for list folding but that work over the Distribution type. This may involve writing your own special fold functions and then rewriting your Functor/Applicative/Monad instances to use them.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文