用于分析 Haskell 程序性能的工具

发布于 2024-09-10 01:24:29 字数 1050 浏览 3 评论 0原文

在解决一些 Project Euler Problems 以学习 Haskell 时（所以目前我完全是初学者），我遇到了问题 12。我写了这个（天真的）解决方案：

--Get Number of Divisors of n
numDivs :: Integer -> Integer
numDivs n = toInteger $ length [ x | x<-[2.. ((n `quot` 2)+1)], n `rem` x == 0] + 2

--Generate a List of Triangular Values
triaList :: [Integer]
triaList =  [foldr (+) 0 [1..n] | n <- [1..]]

--The same recursive
triaList2 = go 0 1
  where go cs n = (cs+n):go (cs+n) (n+1)

--Finds the first triangular Value with more than n Divisors
sol :: Integer -> Integer
sol n = head $ filter (\x -> numDivs(x)>n) triaList2

n=500 (sol 500) 的解决方案非常慢（现在运行了 2 个多小时），所以我想知道如何找出为什么这个解决方案如此缓慢。是否有任何命令可以告诉我大部分计算时间花在哪里，以便我知道我的 haskell 程序的哪一部分很慢？像一个简单的分析器之类的东西。

明确地说，我并不是要求更快的解决方案，而是寻求一种方法来找到该解决方案。如果你没有 Haskell 知识，你会如何开始？

我尝试编写两个 triaList 函数，但发现无法测试哪个函数更快，所以这就是我的问题开始的地方。

谢谢

原文

While solving some Project Euler Problems to learn Haskell (so currently I'm a completly beginner) I came over Problem 12. I wrote this (naive) solution:

--Get Number of Divisors of n
numDivs :: Integer -> Integer
numDivs n = toInteger $ length [ x | x<-[2.. ((n `quot` 2)+1)], n `rem` x == 0] + 2

--Generate a List of Triangular Values
triaList :: [Integer]
triaList =  [foldr (+) 0 [1..n] | n <- [1..]]

--The same recursive
triaList2 = go 0 1
  where go cs n = (cs+n):go (cs+n) (n+1)

--Finds the first triangular Value with more than n Divisors
sol :: Integer -> Integer
sol n = head $ filter (\x -> numDivs(x)>n) triaList2

This Solution for n=500 (sol 500) is extremely slow (running for more than 2 hours now), so I wondered how to find out why this solution is so slow. Are there any commands that tell me where most of the computation-time is spent so I know which part of my haskell-program is slow? Something like a simple profiler.

To make it clear, I'm not asking for a faster solution but for a way to find this solution. How would you start if you would have no haskell knowledge?

I tried to write two triaList functions but found no way to test which one is faster, so this is where my problems start.

Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

七颜 2024-09-17 01:24:29

如何找出此解决方案为何如此缓慢的原因。是否有任何命令可以告诉我大部分计算时间花在哪里，以便我知道我的 haskell 程序的哪一部分速度慢？

恰恰！ GHC 提供了许多优秀的工具，包括：

有关使用时间和空间分析的教程是现实世界 Haskell 的一部分。

GC 统计

首先，确保您使用 ghc -O2 进行编译。您可能会确保它是一个现代的 GHC（例如 GHC 6.12.x）。

我们可以做的第一件事就是检查垃圾收集是否不是问题所在。
使用 +RTS -s 运行你的程序

$ time ./A +RTS -s
./A +RTS -s 
749700
   9,961,432,992 bytes allocated in the heap
       2,463,072 bytes copied during GC
          29,200 bytes maximum residency (1 sample(s))
         187,336 bytes maximum slop
               **2 MB** total memory in use (0 MB lost due to fragmentation)

  Generation 0: 19002 collections,     0 parallel,  0.11s,  0.15s elapsed
  Generation 1:     1 collections,     0 parallel,  0.00s,  0.00s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time   13.15s  ( 13.32s elapsed)
  GC    time    0.11s  (  0.15s elapsed)
  RP    time    0.00s  (  0.00s elapsed)
  PROF  time    0.00s  (  0.00s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time   13.26s  ( 13.47s elapsed)

  %GC time       **0.8%**  (1.1% elapsed)

  Alloc rate    757,764,753 bytes per MUT second

  Productivity  99.2% of total user, 97.6% of total elapsed

./A +RTS -s  13.26s user 0.05s system 98% cpu 13.479 total

这已经给了我们很多信息：你只有 2M 堆，GC 占用了 0.8% 的时间。所以不用担心分配是问题。

时间配置文件

获取程序的时间配置文件非常简单：使用 -prof -auto-all 进行编译

 $ ghc -O2 --make A.hs -prof -auto-all
 [1 of 1] Compiling Main             ( A.hs, A.o )
 Linking A ...

并且，对于 N=200：

$ time ./A +RTS -p                   
749700
./A +RTS -p  13.23s user 0.06s system 98% cpu 13.547 total

创建一个文件 A.prof，包含：

    Sun Jul 18 10:08 2010 Time and Allocation Profiling Report  (Final)

       A +RTS -p -RTS

    total time  =     13.18 secs   (659 ticks @ 20 ms)
    total alloc = 4,904,116,696 bytes  (excludes profiling overheads)

COST CENTRE          MODULE         %time %alloc

numDivs            Main         100.0  100.0

指示 你的所有时间都花在了 numDivs 上，它也是你所有分配的来源。

堆配置文件

您还可以通过运行 +RTS -p -hy 来详细了解这些分配，这会创建 A.hp，您可以通过将其转换为 postscript 文件来查看该文件 (hp2ps - c A.hp），生成：

alt text

这告诉我们您的内存使用没有任何问题：它正在常量分配空间。

所以你的问题是 numDivs 的算法复杂性：

toInteger $ length [ x | x<-[2.. ((n `quot` 2)+1)], n `rem` x == 0] + 2

解决这个问题，这是你运行时间的 100%，其他一切都很简单。

优化

此表达式非常适合流融合优化，所以我会重写它
使用 Data.Vector，如下所示：

numDivs n = fromIntegral $
    2 + (U.length $
        U.filter (\x -> fromIntegral n `rem` x == 0) $
        (U.enumFromN 2 ((fromIntegral n `div` 2) + 1) :: U.Vector Int))

它应该融合到单个循环中，没有不必要的堆分配。也就是说，它比列表版本具有更好的复杂性（通过常数因子）。您可以使用 ghc-core 工具（适用于高级用户）检查优化后的中间代码。

对此进行测试，ghc -O2 --make Z.hs

$ time ./Z     
749700
./Z  3.73s user 0.01s system 99% cpu 3.753 total

因此，它在不改变算法本身的情况下将 N=150 的运行时间减少了 3.5 倍。

结论

你的问题是 numDivs。它占用了你 100% 的运行时间，而且非常复杂。考虑一下 numDivs，以及如何为每个 N 生成 [2 .. n div 2 + 1] N 次。
尝试记住这一点，因为这些值不会改变。

要衡量哪个函数更快，请考虑使用标准，它将提供有关运行时间亚微秒改进的统计可靠信息。

附录

由于 numDivs 是你运行时间的 100%，接触程序的其他部分不会有太大区别，
然而，出于教学目的，我们也可以使用流融合重写它们。

我们也可以重写TrialList，依靠fusion将其变成你在TrialList2中手写的循环，
这是一个“前缀扫描”函数（又名 scanl）：

triaList = U.scanl (+) 0 (U.enumFrom 1 top)
    where
       top = 10^6

对于 sol 来说也是如此：

sol :: Int -> Int
sol n = U.head $ U.filter (\x -> numDivs x > n) triaList

总体运行时间相同，但代码更简洁。

how to find out why this solution is so slow. Are there any commands that tell me where most of the computation-time is spend so I know which part of my haskell-program is slow?

Precisely! GHC provides many excellent tools, including:

A tutorial on using time and space profiling is part of Real World Haskell.

GC Statistics

Firstly, ensure you're compiling with ghc -O2. And you might make sure it is a modern GHC (e.g. GHC 6.12.x)

The first thing we can do is check that garbage collection isn't the problem.
Run your program with +RTS -s

$ time ./A +RTS -s
./A +RTS -s 
749700
   9,961,432,992 bytes allocated in the heap
       2,463,072 bytes copied during GC
          29,200 bytes maximum residency (1 sample(s))
         187,336 bytes maximum slop
               **2 MB** total memory in use (0 MB lost due to fragmentation)

  Generation 0: 19002 collections,     0 parallel,  0.11s,  0.15s elapsed
  Generation 1:     1 collections,     0 parallel,  0.00s,  0.00s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time   13.15s  ( 13.32s elapsed)
  GC    time    0.11s  (  0.15s elapsed)
  RP    time    0.00s  (  0.00s elapsed)
  PROF  time    0.00s  (  0.00s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time   13.26s  ( 13.47s elapsed)

  %GC time       **0.8%**  (1.1% elapsed)

  Alloc rate    757,764,753 bytes per MUT second

  Productivity  99.2% of total user, 97.6% of total elapsed

./A +RTS -s  13.26s user 0.05s system 98% cpu 13.479 total

Which already gives us a lot of information: you only have a 2M heap, and GC takes up 0.8% of time. So no need to worry that allocation is the problem.

Time Profiles

Getting a time profile for your program is straight forward: compile with -prof -auto-all

 $ ghc -O2 --make A.hs -prof -auto-all
 [1 of 1] Compiling Main             ( A.hs, A.o )
 Linking A ...

And, for N=200:

$ time ./A +RTS -p                   
749700
./A +RTS -p  13.23s user 0.06s system 98% cpu 13.547 total

which creates a file, A.prof, containing:

    Sun Jul 18 10:08 2010 Time and Allocation Profiling Report  (Final)

       A +RTS -p -RTS

    total time  =     13.18 secs   (659 ticks @ 20 ms)
    total alloc = 4,904,116,696 bytes  (excludes profiling overheads)

COST CENTRE          MODULE         %time %alloc

numDivs            Main         100.0  100.0

Indicating that all your time is spent in numDivs, and it is also the source of all your allocations.

Heap Profiles

You can also get a break down of those allocations, by running with +RTS -p -hy, which creates A.hp, which you can view by converting it to a postscript file (hp2ps -c A.hp), generating:

alt text

which tells us there's nothing wrong with your memory use: it is allocating in constant space.

So your problem is algorithmic complexity of numDivs:

toInteger $ length [ x | x<-[2.. ((n `quot` 2)+1)], n `rem` x == 0] + 2

Fix that, which is 100% of your running time, and everything else is easy.

Optimizations

This expression is a good candidate for the stream fusion optimization, so I'll rewrite it
to use Data.Vector, like so:

numDivs n = fromIntegral $
    2 + (U.length $
        U.filter (\x -> fromIntegral n `rem` x == 0) $
        (U.enumFromN 2 ((fromIntegral n `div` 2) + 1) :: U.Vector Int))

Which should fuse into a single loop with no unnecessary heap allocations. That is, it will have better complexity (by constant factors) than the list version. You can use the ghc-core tool (for advanced users) to inspect the intermediate code after optimization.

Testing this, ghc -O2 --make Z.hs

$ time ./Z     
749700
./Z  3.73s user 0.01s system 99% cpu 3.753 total

So it reduced running time for N=150 by 3.5x, without changing the algorithm itself.

Conclusion

Your problem is numDivs. It is 100% of your running time, and has terrible complexity. Think about numDivs, and how, for example, for each N you are generating [2 .. n div 2 + 1] N times.
Try memoizing that, since the values don't change.

To measure which of your functions is faster, consider using criterion, which will provide statistically robust information about sub-microsecond improvements in running time.

Addenda

Since numDivs is 100% of your running time, touching other parts of the program won't make much difference,
however, for pedagogical purposes, we can also rewrite those using stream fusion.

We can also rewrite trialList, and rely on fusion to turn it into the loop you write by hand in trialList2,
which is a "prefix scan" function (aka scanl):

triaList = U.scanl (+) 0 (U.enumFrom 1 top)
    where
       top = 10^6

Similarly for sol:

sol :: Int -> Int
sol n = U.head $ U.filter (\x -> numDivs x > n) triaList

With the same overall running time, but a bit cleaner code.

回复收藏 0 原文

天荒地未老 2024-09-17 01:24:29

唐斯的回答很好，没有剧透，直接解决了问题。
在这里我想推荐一个我最近写的工具。当您想要比默认的 ghc -prof -auto-all 更详细的配置文件时，它可以节省您手动编写 SCC 注释的时间。除此之外，它还色彩缤纷！

这是您给出的代码的示例（*），绿色是好的，红色是慢的：
alt text

所有时间都花在创建除数列表上。这建议您可以做一些事情：
1. 使过滤n rem x == 0更快，但由于它是内置函数，可能已经很快了。
2. 创建一个较短的列表。您已经通过仅检查 n quot 2 朝这个方向做了一些事情。
3. 完全放弃列表生成并使用一些数学来获得更快的解决方案。这是解决项目欧拉问题的常用方法。

(*) 我通过将代码放入名为 eu13.hs 的文件中并添加主函数 main = print $ sol 90 来得到这个。然后运行visual-prof -px eu13.hs eu13，结果在eu13.hs.html中。

回复收藏 0 原文

等你爱我 2024-09-17 01:24:29

Haskell 相关说明：triaList2 当然比 triaList 更快，因为后者执行了很多不必要的计算。计算 triaList 的前 n 个元素需要二次时间，但对于 triaList2 则需要线性时间。还有另一种优雅（且高效）的方法来定义三角形数字的无限惰性列表：

triaList = 1 : zipWith (+) triaList [2..]

数学相关注意事项：无需检查最多 n / 2 的所有除数，检查最多 sqrt(n) 就足够了。

Haskell related note: triaList2 is of course faster than triaList because the latter performs a lot of unnecessary computations. It will take quadratic time to compute n first elements of triaList, but linear for triaList2. There is another elegant (and efficient) way to define an infinite lazy list of triangle numbers:

triaList = 1 : zipWith (+) triaList [2..]

Math related note: there is no need to check all divisors up to n / 2, it's enough to check up to sqrt(n).

回复收藏 0 原文

为你拒绝所有暧昧 2024-09-17 01:24:29

您可以使用标志来运行程序以启用时间分析。像这样的事情：

./program +RTS -P -sprogram.stats -RTS

应该运行程序并生成一个名为program.stats的文件，其中将显示每个函数花费了多少时间。您可以在 GHC 中找到有关使用 GHC 进行分析的更多信息用户指南。对于基准测试，有 Criterion 库。我发现这个博客文章有一个有用的介绍。

You can run your program with flags to enable time profiling. Something like this:

./program +RTS -P -sprogram.stats -RTS

That should run the program and produce a file called program.stats which will have how much time was spent in each function. You can find more information about profiling with GHC in the GHC user guide. For benchmarking, there is the Criterion library. I've found this blog post has a useful introduction.

回复收藏 0 原文

~没有更多了~