当前位置：文江博客话题详情

如何在scala中优化这个短阶乘函数？（创建 50000 个 BigInt）

发布于 2024-12-11 13:01:16 字数 320 浏览 0 评论 0 原文

我将 scala 版本

(BigInt(1) to BigInt(50000)).reduce(_ * _)

与 python 版本

reduce(lambda x,y: x*y, range(1,50000))

进行了比较，结果发现，scala 版本比 python 版本花费的时间大约是 python 版本的 10 倍。

我猜测，一个很大的区别是 python 可以使用其原生的 long 类型，而不是为每个数字创建新的 BigInt 对象。但是scala有解决方法吗？

原文

I've compaired the scala version

(BigInt(1) to BigInt(50000)).reduce(_ * _)

to the python version

reduce(lambda x,y: x*y, range(1,50000))

and it turns out, that the scala version took about 10 times longer than the python version.

I'm guessing, a big difference is that python can use its native long type instead of creating new BigInt-objects for each number. But is there a workaround in scala?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

我不咬妳我踢妳 2024-12-18 13:01:17

事实上，您的 Scala 代码创建了 50,000 个 BigInt 对象，这一事实不太可能在这里产生太大影响。一个更大的问题是乘法算法——Python 的 long 使用 Karatsuba 乘法和Java 的 BigInteger（BigInt 只是包装）没有。

最简单的解决方法可能是切换到更好的任意精度数学库，例如 JScience 的：

import org.jscience.mathematics.number.LargeInteger

(1 to 50000).foldLeft(LargeInteger.ONE)(_ times _)

这比我的机器上的Python解决方案。

更新：我已经使用一些快速基准测试代码。 google.com/p/caliper/" rel="nofollow noreferrer">Caliper 响应 Luigi Plingi 的回答，这在我的（四核）机器上给出了以下结果：

              benchmark   ms linear runtime
         BigIntFoldLeft 4774 ==============================
             BigIntFold 4739 =============================
           BigIntReduce 4769 =============================
      BigIntFoldLeftPar 4642 =============================
          BigIntFoldPar  500 ===
        BigIntReducePar  499 ===
   LargeIntegerFoldLeft 3042 ===================
       LargeIntegerFold 3003 ==================
     LargeIntegerReduce 3018 ==================
LargeIntegerFoldLeftPar 3038 ===================
    LargeIntegerFoldPar  246 =
  LargeIntegerReducePar  260 =

我看不出 reduce 和fold 他所做的，但寓意很明确：如果您可以使用 Scala 2.9 的并行集合，它们会给您带来巨大的改进，但切换到 LargeInteger 会有所帮助以及。

The fact that your Scala code creates 50,000 BigInt objects is unlikely to be making much of a difference here. A bigger issue is the multiplication algorithm—Python's long uses Karatsuba multiplication and Java's BigInteger (which BigInt just wraps) doesn't.

The easiest workaround is probably to switch to a better arbitrary precision math library, like JScience's:

import org.jscience.mathematics.number.LargeInteger

(1 to 50000).foldLeft(LargeInteger.ONE)(_ times _)

This is faster than the Python solution on my machine.

Update: I've written some quick benchmarking code using Caliper in response to Luigi Plingi's answer, which gives the following results on my (quad core) machine:

              benchmark   ms linear runtime
         BigIntFoldLeft 4774 ==============================
             BigIntFold 4739 =============================
           BigIntReduce 4769 =============================
      BigIntFoldLeftPar 4642 =============================
          BigIntFoldPar  500 ===
        BigIntReducePar  499 ===
   LargeIntegerFoldLeft 3042 ===================
       LargeIntegerFold 3003 ==================
     LargeIntegerReduce 3018 ==================
LargeIntegerFoldLeftPar 3038 ===================
    LargeIntegerFoldPar  246 =
  LargeIntegerReducePar  260 =

I don't see the difference between reduce and fold that he does, but the moral is clear: if you can use Scala 2.9's parallel collections, they'll give you a huge improvement, but switching to LargeInteger helps as well.

回复收藏 0 原文

愛放△進行李 2024-12-18 13:01:17

我的机器上的 Python：

def func():
  start= time.clock()
  reduce(lambda x,y: x*y, range(1,50000))
  end= time.clock()
  t = (end-start) * 1000
  print t

给出了 1219 毫秒

Scala：

def timed[T](f: => T) = {
  val t0 = System.currentTimeMillis
  val r = f
  val t1 = System.currentTimeMillis
  println("Took: "+(t1 - t0)+" ms")
  r
}

timed { (BigInt(1) to BigInt(50000)).reduce(_ * _) }
4251 ms

timed { (BigInt(1) to BigInt(50000)).fold(BigInt(1))(_ * _) }
4224 ms

timed { (BigInt(1) to BigInt(50000)).par.reduce(_ * _) }
2083 ms

timed { (BigInt(1) to BigInt(50000)).par.fold(BigInt(1))(_ * _) }
689 ms

// using org.jscience.mathematics.number.LargeInteger from Travis's answer
timed { val a = (1 to 50000).foldLeft(LargeInteger.ONE)(_ times _) }
3327 ms

timed { val a = (1 to 50000).map(LargeInteger.valueOf(_)).par.fold(
                                          LargeInteger.ONE)(_ times _) }
361 ms

这 689 毫秒和 361 毫秒是在几次热身运行之后。它们都在大约 1000 毫秒开始，但似乎预热的时间不同。并行集合似乎比非并行集合预热得更多：非并行操作与第一次运行相比并没有显着减少。

.par（意思是，使用并行集合）似乎比 reduce 更快地加快了 fold 的速度。我只有 2 个核心，但更多数量的核心应该会带来更大的性能提升。

因此，通过实验，优化此函数的方法是

a) 使用 fold 而不是 reduce

b) 使用并行集合

update:
受到将计算分解为更小的块可以加快速度这一观察结果的启发，我设法让他在我的机器上以 215 ms 运行，这比标准并行算法提高了 40%。（使用 BigInt，需要 615 毫秒。）此外，它不使用并行集合，但以某种方式使用 90% CPU（与 BigInt 不同）。

  import org.jscience.mathematics.number.LargeInteger

  def fact(n: Int) = {
    def loop(seq: Seq[LargeInteger]): LargeInteger = seq.length match {
      case 0 => throw new IllegalArgumentException
      case 1 => seq.head
      case _ => loop {
        val (a, b) = seq.splitAt(seq.length / 2)
        a.zipAll(b, LargeInteger.ONE, LargeInteger.ONE).map(i => i._1 times i._2)
      } 
    }
    loop((1 to n).map(LargeInteger.valueOf(_)).toIndexedSeq)
  }

Python on my machine:

def func():
  start= time.clock()
  reduce(lambda x,y: x*y, range(1,50000))
  end= time.clock()
  t = (end-start) * 1000
  print t

gives 1219 ms

Scala:

def timed[T](f: => T) = {
  val t0 = System.currentTimeMillis
  val r = f
  val t1 = System.currentTimeMillis
  println("Took: "+(t1 - t0)+" ms")
  r
}

timed { (BigInt(1) to BigInt(50000)).reduce(_ * _) }
4251 ms

timed { (BigInt(1) to BigInt(50000)).fold(BigInt(1))(_ * _) }
4224 ms

timed { (BigInt(1) to BigInt(50000)).par.reduce(_ * _) }
2083 ms

timed { (BigInt(1) to BigInt(50000)).par.fold(BigInt(1))(_ * _) }
689 ms

// using org.jscience.mathematics.number.LargeInteger from Travis's answer
timed { val a = (1 to 50000).foldLeft(LargeInteger.ONE)(_ times _) }
3327 ms

timed { val a = (1 to 50000).map(LargeInteger.valueOf(_)).par.fold(
                                          LargeInteger.ONE)(_ times _) }
361 ms

This 689 ms and 361 ms were after a few warmup runs. They both started at about 1000 ms, but seem to warm up by different amounts. The parallel collections seem to warm up significantly more than the non-parallel: the non-parallel operations did not reduce significantly from their first runs.

The .par (meaning, use parallel collections) seemed to speed up fold more than reduce. I only have 2 cores, but a greater number of cores should see a bigger performance gain.

So, experimentally, the way to optimize this function is

a) Use fold rather than reduce

b) Use parallel collections

update:
Inspired by the observation that breaking the calculation down into smaller chunks speeds things up, I managed to get he following to run in 215 ms on my machine, which is a 40% improvement on the standard parallelized algorithm. (Using BigInt, it takes 615 ms.) Also, it doesn't use parallel collections, but somehow uses 90% CPU (unlike for BigInt).

  import org.jscience.mathematics.number.LargeInteger

  def fact(n: Int) = {
    def loop(seq: Seq[LargeInteger]): LargeInteger = seq.length match {
      case 0 => throw new IllegalArgumentException
      case 1 => seq.head
      case _ => loop {
        val (a, b) = seq.splitAt(seq.length / 2)
        a.zipAll(b, LargeInteger.ONE, LargeInteger.ONE).map(i => i._1 times i._2)
      } 
    }
    loop((1 to n).map(LargeInteger.valueOf(_)).toIndexedSeq)
  }

回复收藏 0 原文

澜川若宁 2024-12-18 13:01:17

这里的另一个技巧可能是尝试 reduceLeft 和 reduceRight 来看看哪个最快。在您的示例中，我的 reduceRight 执行速度要快得多：

scala> timed { (BigInt(1) to BigInt(50000)).reduceLeft(_ * _) }
Took: 4605 ms

scala> timed { (BigInt(1) to BigInt(50000)).reduceRight(_ * _) }
Took: 2004 ms

foldLeft 和 foldRight 之间的差异相同。猜猜从树的哪一侧开始减少很重要:)

Another trick here could be to try both reduceLeft and reduceRight to see what is fastest. On your example I get a much faster execution of reduceRight:

scala> timed { (BigInt(1) to BigInt(50000)).reduceLeft(_ * _) }
Took: 4605 ms

scala> timed { (BigInt(1) to BigInt(50000)).reduceRight(_ * _) }
Took: 2004 ms

Same difference between foldLeft and foldRight. Guess it matters what side of the tree you start reducing from :)

回复收藏 0 原文

绝影如岚 2024-12-18 13:01:17

在 Scala 中计算阶乘的最有效方法是使用分而治之的策略：

def fact(n: Int): BigInt = rangeProduct(1, n)

private def rangeProduct(n1: Long, n2: Long): BigInt = n2 - n1 match {
  case 0 => BigInt(n1)
  case 1 => BigInt(n1 * n2)
  case 2 => BigInt(n1 * (n1 + 1)) * n2
  case 3 => BigInt(n1 * (n1 + 1)) * ((n2 - 1) * n2)
  case _ => 
    val nm = (n1 + n2) >> 1
    rangeProduct(n1, nm) * rangeProduct(nm + 1, n2)
}

此外，为了获得更快的速度，请使用最新版本的 JDK 和以下 JVM 选项：

-server -XX:+TieredCompilation

下面是 Intel(R) Core(TM) i7-2640M CPU @ 2.80GHz 的结果（最大 3.50GHz）、RAM 12Gb DDR3-1333、Windows 7 sp1、Oracle JDK 1.8.0_25-b18 64 位：

(BigInt(1) to BigInt(100000)).product took: 3,806 ms with 26.4 % of CPU usage
(BigInt(1) to BigInt(100000)).reduce(_ * _) took: 3,728 ms with 25.4 % of CPU usage
(BigInt(1) to BigInt(100000)).reduceLeft(_ * _) took: 3,510 ms with 25.1 % of CPU usage
(BigInt(1) to BigInt(100000)).reduceRight(_ * _) took: 4,056 ms with 25.5 % of CPU usage
(BigInt(1) to BigInt(100000)).fold(BigInt(1))(_ * _) took: 3,697 ms with 25.5 % of CPU usage
(BigInt(1) to BigInt(100000)).par.product took: 406 ms with 66.3 % of CPU usage
(BigInt(1) to BigInt(100000)).par.reduce(_ * _) took: 296 ms with 71.1 % of CPU usage
(BigInt(1) to BigInt(100000)).par.reduceLeft(_ * _) took: 3,495 ms with 25.3 % of CPU usage
(BigInt(1) to BigInt(100000)).par.reduceRight(_ * _) took: 3,900 ms with 25.5 % of CPU usage
(BigInt(1) to BigInt(100000)).par.fold(BigInt(1))(_ * _) took: 327 ms with 56.1 % of CPU usage
fact(100000) took: 203 ms with 28.3 % of CPU usage

顺便说一句，要提高大于 20000 的数字的阶乘计算效率，请使用以下实现 Schönhage-Strassen 算法或等待直到它被合并到 JDK 9 并且 Scala 将能够使用它

Most efficient way to calculate factorial in Scala is using of divide and conquer strategy:

def fact(n: Int): BigInt = rangeProduct(1, n)

private def rangeProduct(n1: Long, n2: Long): BigInt = n2 - n1 match {
  case 0 => BigInt(n1)
  case 1 => BigInt(n1 * n2)
  case 2 => BigInt(n1 * (n1 + 1)) * n2
  case 3 => BigInt(n1 * (n1 + 1)) * ((n2 - 1) * n2)
  case _ => 
    val nm = (n1 + n2) >> 1
    rangeProduct(n1, nm) * rangeProduct(nm + 1, n2)
}

Also to get more speed use latest version of JDK and following JVM options:

-server -XX:+TieredCompilation

Bellow are results for Intel(R) Core(TM) i7-2640M CPU @ 2.80GHz (max 3.50GHz), RAM 12Gb DDR3-1333, Windows 7 sp1, Oracle JDK 1.8.0_25-b18 64-bit:

(BigInt(1) to BigInt(100000)).product took: 3,806 ms with 26.4 % of CPU usage
(BigInt(1) to BigInt(100000)).reduce(_ * _) took: 3,728 ms with 25.4 % of CPU usage
(BigInt(1) to BigInt(100000)).reduceLeft(_ * _) took: 3,510 ms with 25.1 % of CPU usage
(BigInt(1) to BigInt(100000)).reduceRight(_ * _) took: 4,056 ms with 25.5 % of CPU usage
(BigInt(1) to BigInt(100000)).fold(BigInt(1))(_ * _) took: 3,697 ms with 25.5 % of CPU usage
(BigInt(1) to BigInt(100000)).par.product took: 406 ms with 66.3 % of CPU usage
(BigInt(1) to BigInt(100000)).par.reduce(_ * _) took: 296 ms with 71.1 % of CPU usage
(BigInt(1) to BigInt(100000)).par.reduceLeft(_ * _) took: 3,495 ms with 25.3 % of CPU usage
(BigInt(1) to BigInt(100000)).par.reduceRight(_ * _) took: 3,900 ms with 25.5 % of CPU usage
(BigInt(1) to BigInt(100000)).par.fold(BigInt(1))(_ * _) took: 327 ms with 56.1 % of CPU usage
fact(100000) took: 203 ms with 28.3 % of CPU usage

BTW to improve efficience of factorial calculation for numbers that greater than 20000 use following implementation of Schönhage-Strassen algorithm or wait until it will be merged to JDK 9 and Scala will be able to use it

回复收藏 0 原文

~没有更多了~

关于作者

情话难免假

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

如何在scala中优化这个短阶乘函数？（创建 50000 个 BigInt）

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

已经忘了多久

15867725375

LonelySnow

走过海棠暮

轻许诺言

信馬由缰

友情链接

如何在scala中优化这个短阶乘函数？ （创建 50000 个 BigInt）

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

已经忘了多久

15867725375

LonelySnow

走过海棠暮

轻许诺言

信馬由缰

友情链接

如何在scala中优化这个短阶乘函数？（创建 50000 个 BigInt）

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。