递归与累加器风格的性能
我们有两个函数来计算给定数字的阶乘。第一个 !
使用累加器样式。第二个事实,使用自然递归。
(define (! n0)
(local (;; accumulator is the product of all natural numbers in [n0, n)
(define (!-a n accumulator)
(cond
[(zero? n) accumulator]
[else (!-a (sub1 n) (* n accumulator))])))
(!-a n0 1)))
在
(define (fact n)
(cond
[(= 0 n) 1]
[else (* n (fact (- n 1)))]))
第 31 节的底部,HtDP 指出自然递归版本是通常与累加器版本一样快(如果不是更快的话),但没有说明原因。我对此做了一些阅读,似乎答案是“尾调用优化/消除”,但是维基百科的文章似乎与 HtDP 的说法不一致,至少在性能方面是如此。为什么会这样呢?
在工作中,递归方式速度更快。在家里,蓄能器风格速度更快。是否没有通用的启发法来指导选择通常首选哪种风格?我知道累加器风格的内存效率更高,但如果我们将讨论仅限于性能,那么至少对我来说,不清楚哪个是更好的选择。
我对此进行了进一步思考,并且不得不支持维基百科关于累加器式递归在一般情况下的优越性的文章。它不仅减少了堆栈/堆空间的使用,而且内存访问总是落后于寄存器访问,并且现在多核的出现只会变得更加明显。尽管如此,HtDP 证明在所有情况下都需要进行实际测试。
We have two functions that compute the factorial of a given number. The first one, !
, uses an accumulator style. The second, fact
, uses natural recursion.
(define (! n0)
(local (;; accumulator is the product of all natural numbers in [n0, n)
(define (!-a n accumulator)
(cond
[(zero? n) accumulator]
[else (!-a (sub1 n) (* n accumulator))])))
(!-a n0 1)))
and
(define (fact n)
(cond
[(= 0 n) 1]
[else (* n (fact (- n 1)))]))
At the bottom of Section 31, HtDP states that the naturally recursive version is often as fast if not faster than the accumulator version, but does not state the reasons why. I did some reading on this and it seems that the answer is 'tail call optimization/elimination', but the Wikipedia article seems to be at odds with what HtDP says, at least with respect to performance. Why is this so?
At work, the recursive style is faster. At home, the accumulator style is faster. Is there no general heuristic to guide a choice as to which style is generally preferred? I understand that the accumulator-style is more memory-efficient, but if we restrict the discussion to performance alone, it is unclear, at least to me, which is the better choice.
I've thought about this a little further, and would have to side with the Wikipedia article on the superiority of the accumulator-style recursion in the general case. Not only does it reduce usage of stack/heap space, memory access is always going to be behind register access and can only be made more evident now that multicore is here. Still, HtDP proves that actual testing is required in all cases.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
答案将取决于 Racket 系统的细节。这是我的看法。
自然递归版本和累加器版本之间有两个主要区别。首先,累加器版本以允许尾部调用优化的方式编写。这有助于使累加器版本更快,因为需要创建的堆栈帧更少。但这与 HtDP 中讨论的内容以及您在工作计算机上看到的内容相反。
另一个区别是乘法的顺序。自然递归版本将 1 到 20 的数字按升序相乘,即
累加器版本将相同的数字按降序相乘,即
数学上,这些是相同的,两个阶乘函数将给出相同的结果。尽管如此,这种差异可能很重要。特别地,在任何中间乘法中,后一计算的中间结果将大于前一计算的中间结果。
20 的阶乘是一个很大的数字。它不适合 32 位整数。这意味着球拍将需要使用任意长度的整数(“bignum”)来表示答案以及一些中间结果。任意精度算术(包括涉及 bignum 的乘法)比固定精度算术慢。
由于累加器版本中的中间结果总是大于自然递归版本,因此累加器版本将需要比递归版本更早的 bignum。简而言之,虽然两个版本都需要相同数量的乘法,但累加器版本需要更多的任意精度乘法。这使得累加器版本变慢。显然,算法的额外成本超过了减少堆栈帧数量所节省的成本。
那么为什么同样的趋势不会出现在您的家用计算机上呢?你说它是 Intel iMac,所以它可能是 64 位系统。趁20岁!是一个很大的数字,与 64 位整数相比,它很小,因此您的家用计算机不会执行任何任意精度的算术,并且顺序并不重要。 HtDP 已经足够老了,它可以使用 32 位系统,就像工作计算机上的 Windows XP 一样。
对于探索差异更有用的是计算数字列表
或累加器版本的乘积的函数。然后,您可以按升序或降序输入数字,无论您使用的是自然递归还是基于累加器的方法。
The answer will depend on the details of the Racket system. Here's my take on it.
There are two major differences between the naturally recursive version and the accumulator version. First, the accumulator version is written in a fashion that allows tail-call optimization. This helps make the accumulator version faster, as fewer stack frames need to be created. But that is the opposite of what is discussed in HtDP and that you've seen on your work computer.
The other difference is the order of multiplication. The naturally recursive version multiples the numbers from 1 to 20 in ascending order, that is
The accumulator version multiplies the same numbers in descending order, that is
Mathematically, these are the same, and the two factorial functions will give the same result. Nonetheless, this difference can matter. In particular, at any intermediate multiplication, the intermediate result will be larger for the latter calculation than for the former calculation.
The factorial of 20 is a big number. It won't fit in a 32 bit integer. That means that racket will need to use an arbitrary length integer (a "bignum") to represent the answer, and some of the intermediate results. Arbitrary precision arithmetic, including multiplication involving bignums, is slower than fixed precision arithmetic.
Since the intermediate results in the accumulator version are always larger than for the naturally recursive version, the accumulator version will require a bignum earlier than the recursive version. In short, while both versions require the same number of multiplications, the accumulator version requires more arbitrary precision multiplications. This makes the accumulator version slower. Apparently, the additional cost of the arithmetic outweighs the saving from reducing the number of stack frames.
So why wouldn't the same trend show up on your home computer? You said it was an Intel iMac, so it is probably a 64 bit system. While 20! is a big number, it is small compared to what will fit in a 64 bit integer, so your home computer isn't doing any arbitrary precision arithmetic and the order doesn't matter. HtDP is old enough that it would have used a 32 bit system, as would Windows XP on your work computer.
More useful to explore the differences would be a function that calculates the product of a list of numbers, either
or an accumulator version. You could then feed in the numbers in either ascending or descending order, independent of whether you're using a naturally recursive or accumulator-based approach.
我不知道 Racket 编译器的内部结构,但我会推测。
尾部调用通常比普通调用更昂贵(在 .NET 中也是如此,最多慢 7 倍),但在某些情况下,可以消除尾部调用,它最终会成为
while(1) { .. . }
C 风格的循环,因此不需要进行额外的调用,只需简单的本地跳转,有效地消除了过程应用程序的开销。I don't know the inners of the Racket compiler, but I'll speculate.
Tail calls are generally more expensive than normal calls (this is true in .NET, up to 7 times slower), but in some cases the tail call can be eliminated and it ends up being a
while(1) { ... }
C-style loop, hence there will be no extra calls to make, just a simply local jump, effectively eliminating procedure application overhead.一个好的编译器会将递归 fac 转变为尾递归 fac。所以编译后的代码应该没有区别。
A good compiler would turn the recursive fac into a tail recursive one. So there should be no difference in compiled code.
上面有很多优点。我喜欢分析应该做什么和为什么不应该做什么。这就是欧拉计划成功的基础。从fixnums出发太早可能会出现问题。
数字序列可以从大到小相乘,反之亦然。我们还有“do”命令,可以直接且类似地进行迭代。
Many good points above. I love the analysis of what should be versus why it didn't. That is the stuff of which Project Euler success is made. Too early a trip from fixnums can be problematic.
The sequence of numbers can be multiplied from large to small or vice versa. We also have the 'do' command that does iteration directly and similarly.