在 Haskell 中进行高效的数值计算

发布于 2024-12-19 05:46:04 字数 805 浏览 8 评论 0 原文

我受到这篇名为“只有快速语言有趣”看看他在 Haskell 中提出的问题(对向量中的几百万个数字求和)并与他的结果进行比较。

我是 Haskell 新手,所以我真的不知道如何正确计时或如何有效地做到这一点,我对这个问题的第一次尝试如下。请注意,我没有在向量中使用随机数,因为我不确定如何以良好的方式进行操作。我还打印一些东西以确保完整的评估。

import System.TimeIt

import Data.Vector as V

vector :: IO (Vector Int)
vector = do
  let vec = V.replicate 3000000 10
  print $ V.length vec
  return vec

sumit :: IO ()
sumit = do
  vec <- vector
  print $ V.sum vec

time = timeIt sumit

将其加载到 GHCI 中并运行 time 告诉我,运行 300 万个数字大约需要 0.22 秒,运行 3000 万个数字大约需要 2.69 秒。

与博客作者在 Lush 中的 0.02s 和 0.18s 的结果相比,这要差很多,这让我相信这可以用更好的方式来完成。

注意:以上代码需要TimeIt包才能运行。 cabal install timeit 将为您提供它。

I was inspired by this post called "Only fast languages are interesting" to look at the problem he suggests (sum'ing a couple of million numbers from a vector) in Haskell and compare to his results.

I'm a Haskell newbie so I don't really know how to time correctly or how to do this efficiently, my first attempt at this problem was the following. Note that I'm not using random numbers in the vector as I'm not sure how to do in a good way. I'm also printing stuff in order to ensure full evaluation.

import System.TimeIt

import Data.Vector as V

vector :: IO (Vector Int)
vector = do
  let vec = V.replicate 3000000 10
  print $ V.length vec
  return vec

sumit :: IO ()
sumit = do
  vec <- vector
  print $ V.sum vec

time = timeIt sumit

Loading this up in GHCI and running time tells me that it took about 0.22s to run for 3 million numbers and 2.69s for 30 million numbers.

Compared to the blog authors results of 0.02s and 0.18s in Lush it's quite a lot worse, which leads me to believe this can be done in a better way.

Note: The above code needs the package TimeIt to run. cabal install timeit will get it for you.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

本宫微胖 2024-12-26 05:46:05

您的原始文件,未编译,然后在不进行优化的情况下进行编译,然后使用简单的优化标志进行编译:

$ runhaskell boxed.hs  
3000000
30000000
CPU time:   0.35s

$ ghc --make boxed.hs -o unoptimized 
$ ./unoptimized
3000000
30000000
CPU time:   0.34s



$ ghc --make -O2 boxed.hs 
$ ./boxed
3000000
30000000
CPU time:   0.09s

您的文件使用 import Qualified Data.Vector.Unboxed as V 而不是 import Qualified Data.Vector as V Int 是不可装箱的类型)--
首先不进行优化,然后使用:

$ ghc --make unboxed.hs -o unoptimized
$ ./unoptimized
3000000
30000000
CPU time:   0.27s


$ ghc --make -O2 unboxed.hs 
$ ./unboxed
3000000
30000000
CPU time:   0.04s

因此,编译,优化......并在可能的情况下使用Data.Vector.Unboxed

Your original file, uncompiled, then compiled without optimization, then compiled with a simple optimization flag:

$ runhaskell boxed.hs  
3000000
30000000
CPU time:   0.35s

$ ghc --make boxed.hs -o unoptimized 
$ ./unoptimized
3000000
30000000
CPU time:   0.34s



$ ghc --make -O2 boxed.hs 
$ ./boxed
3000000
30000000
CPU time:   0.09s

Your file with import qualified Data.Vector.Unboxed as V instead of import qualified Data.Vector as V (Int is an unboxable type) --
first without optimization then with:

$ ghc --make unboxed.hs -o unoptimized
$ ./unoptimized
3000000
30000000
CPU time:   0.27s


$ ghc --make -O2 unboxed.hs 
$ ./unboxed
3000000
30000000
CPU time:   0.04s

So, compile, optimize ... and where possible use Data.Vector.Unboxed

木森分化 2024-12-26 05:46:05

尝试使用未装箱的向量,尽管我不确定它在这种情况下是否会产生明显的差异。另请注意,这种比较有点不公平,因为向量包应该完全优化向量(这种优化称为流融合)。

Try to use an unboxed vector, although I'm not sure whether it makes a noticable difference in this case. Note also that the comparison is slightly unfair, because the vector package should optimize the vector away entirely (this optimization is called stream fusion).

故笙诉离歌 2024-12-26 05:46:05

如果您使用足够大的向量,Vector Unboxed 可能会变得不切实际。对我来说,如果向量大小>,纯(惰性)列表会更快。 50000000:

import System.TimeIt

sumit :: IO ()
sumit = print . sum $ replicate 50000000 10

main :: IO ()
main = timeIt sumit

我得到这些时间:

Unboxed Vectors
CPU time:   1.00s

List:
CPU time:   0.70s

编辑:我已经使用 Criterion 重复了基准测试并使 sumit 变得纯粹。代码和结果如下:

代码:

import Criterion.Main

sumit :: Int -> Int
sumit m = sum $ replicate m 10

main :: IO ()
main = defaultMain [bench "sumit" $ nf sumit 50000000]

结果:

warming up
estimating clock resolution...
mean is 7.248078 us (80001 iterations)
found 24509 outliers among 79999 samples (30.6%)
  6044 (7.6%) low severe
  18465 (23.1%) high severe
estimating cost of a clock call...
mean is 68.15917 ns (65 iterations)
found 7 outliers among 65 samples (10.8%)
  3 (4.6%) high mild
  4 (6.2%) high severe

benchmarking sumit
collecting 100 samples, 1 iterations each, in estimated 46.07401 s
mean: 451.0233 ms, lb 450.6641 ms, ub 451.5295 ms, ci 0.950
std dev: 2.172022 ms, lb 1.674497 ms, ub 2.841110 ms, ci 0.950

看起来print有很大的不同,正如预期的那样!

If you use big enough vectors, Vector Unboxed might become impractical. For me pure (lazy) lists are quicker, if vector size > 50000000:

import System.TimeIt

sumit :: IO ()
sumit = print . sum $ replicate 50000000 10

main :: IO ()
main = timeIt sumit

I get these times:

Unboxed Vectors
CPU time:   1.00s

List:
CPU time:   0.70s

Edit: I've repeated the benchmark using Criterion and making sumit pure. Code and results as follow:

Code:

import Criterion.Main

sumit :: Int -> Int
sumit m = sum $ replicate m 10

main :: IO ()
main = defaultMain [bench "sumit" $ nf sumit 50000000]

Results:

warming up
estimating clock resolution...
mean is 7.248078 us (80001 iterations)
found 24509 outliers among 79999 samples (30.6%)
  6044 (7.6%) low severe
  18465 (23.1%) high severe
estimating cost of a clock call...
mean is 68.15917 ns (65 iterations)
found 7 outliers among 65 samples (10.8%)
  3 (4.6%) high mild
  4 (6.2%) high severe

benchmarking sumit
collecting 100 samples, 1 iterations each, in estimated 46.07401 s
mean: 451.0233 ms, lb 450.6641 ms, ub 451.5295 ms, ci 0.950
std dev: 2.172022 ms, lb 1.674497 ms, ub 2.841110 ms, ci 0.950

It looks like print makes a big difference, as it should be expected!

太阳公公是暖光 2024-12-26 05:46:04

首先,认识到 GHCi 是一个解释器,它的设计速度并不是很快。为了获得更有用的结果,您应该在启用优化的情况下编译代码。这可以产生巨大的影响。

另外,对于 Haskell 代码的任何严格的基准测试,我建议使用 criterion。它使用各种统计技术来确保您获得可靠的测量结果。

我修改了您的代码以使用标准并删除了打印语句,以便我们不对 I/O 进行计时。

import Criterion.Main
import Data.Vector as V

vector :: IO (Vector Int)
vector = do
  let vec = V.replicate 3000000 10
  return vec

sumit :: IO Int
sumit = do
  vec <- vector
  return $ V.sum vec

main = defaultMain [bench "sumit" $ whnfIO sumit]

使用 -O2 进行编译,我在一台速度相当慢的上网本上得到了这个结果:

$ ghc --make -O2 Sum.hs
$ ./Sum 
warming up
estimating clock resolution...
mean is 56.55146 us (10001 iterations)
found 1136 outliers among 9999 samples (11.4%)
  235 (2.4%) high mild
  901 (9.0%) high severe
estimating cost of a clock call...
mean is 2.493841 us (38 iterations)
found 4 outliers among 38 samples (10.5%)
  2 (5.3%) high mild
  2 (5.3%) high severe

benchmarking sumit
collecting 100 samples, 8 iterations each, in estimated 6.180620 s
mean: 9.329556 ms, lb 9.222860 ms, ub 9.473564 ms, ci 0.950
std dev: 628.0294 us, lb 439.1394 us, ub 1.045119 ms, ci 0.950

所以我得到的平均时间刚刚超过 9 毫秒,标准偏差小于一毫秒。对于较大的测试用例,我得到了大约 100 毫秒。

使用 vector 包时启用优化尤其重要,因为它大量使用流融合,在这种情况下能够完全消除数据结构,从而改变您的程序进入一个高效、紧密的循环。

使用 -fllvm 选项尝试新的基于 LLVM 的代码生成器可能也是值得的。 这显然很好-适合数字代码

First of all, realize that GHCi is an interpreter, and it's not designed to be very fast. To get more useful results you should compile the code with optimizations enabled. This can make a huge difference.

Also, for any serious benchmarking of Haskell code, I recommend using criterion. It uses various statistical techniques to ensure that you're getting reliable measurements.

I modified your code to use criterion and removed the print statements so that we're not timing the I/O.

import Criterion.Main
import Data.Vector as V

vector :: IO (Vector Int)
vector = do
  let vec = V.replicate 3000000 10
  return vec

sumit :: IO Int
sumit = do
  vec <- vector
  return $ V.sum vec

main = defaultMain [bench "sumit" $ whnfIO sumit]

Compiling this with -O2, I get this result on a pretty slow netbook:

$ ghc --make -O2 Sum.hs
$ ./Sum 
warming up
estimating clock resolution...
mean is 56.55146 us (10001 iterations)
found 1136 outliers among 9999 samples (11.4%)
  235 (2.4%) high mild
  901 (9.0%) high severe
estimating cost of a clock call...
mean is 2.493841 us (38 iterations)
found 4 outliers among 38 samples (10.5%)
  2 (5.3%) high mild
  2 (5.3%) high severe

benchmarking sumit
collecting 100 samples, 8 iterations each, in estimated 6.180620 s
mean: 9.329556 ms, lb 9.222860 ms, ub 9.473564 ms, ci 0.950
std dev: 628.0294 us, lb 439.1394 us, ub 1.045119 ms, ci 0.950

So I'm getting an average of just over 9 ms with a standard deviation of less than a millisecond. For the larger test case, I'm getting about 100ms.

Enabling optimizations is especially important when using the vector package, as it makes heavy use of stream fusion, which in this case is able to eliminate the data structure entirely, turning your program into an efficient, tight loop.

It may also be worthwhile to experiment with the new LLVM-based code generator by using -fllvm option. It is apparently well-suited for numeric code.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文