F# 与 OCaml:堆栈溢出
我最近发现了一个关于F# for Python程序员的演示,看完之后,我决定实现我自己解决了“蚂蚁难题”。
有一只蚂蚁可以在平面网格上走动。蚂蚁一次可以向左、向右、向上或向下移动一格。也就是说,蚂蚁可以从单元格 (x, y) 前往单元格 (x+1, y)、(x-1, y)、(x, y+1) 和 (x, y-1)。 x 和 y 坐标的数字之和大于 25 的点是蚂蚁无法到达的。例如,点(59,79)是不可访问的,因为5 + 9 + 7 + 9 = 30,大于25。问题是:如果从(1000, 1000)开始,蚂蚁可以访问多少个点,包括 (1000, 1000) 本身?
我用 30 行 OCaml 首先,并尝试了一下:
$ ocamlopt -unsafe -rectypes -inline 1000 -o puzzle ant.ml
$ time ./puzzle
Points: 148848
real 0m0.143s
user 0m0.127s
sys 0m0.013s
Neat,我的结果与 leonardo 的 D 和 C++ 实现。与 Leonardo 的 C++ 实现相比,OCaml 版本的运行速度大约比 C++ 慢 2 倍。鉴于 Leonardo 使用队列来删除递归,这没关系。
然后我 将代码翻译为 F# ...这就是我得到的:
Thanassis@HOME /g/Tmp/ant.fsharp
$ /g/Program\ Files/FSharp-2.0.0.0/bin/fsc.exe ant.fs
Microsoft (R) F# 2.0 Compiler build 2.0.0.0
Copyright (c) Microsoft Corporation. All Rights Reserved.
Thanassis@HOME /g/Tmp/ant.fsharp
$ ./ant.exe
Process is terminated due to StackOverflowException.
Quit
Thanassis@HOME /g/Tmp/ant.fsharp
$ /g/Program\ Files/Microsoft\ F#/v4.0/Fsc.exe ant.fs
Microsoft (R) F# 2.0 Compiler build 4.0.30319.1
Copyright (c) Microsoft Corporation. All Rights Reserved.
Thanassis@HOME /g/Tmp/ant.fsharp
$ ./ant.exe
Process is terminated due to StackOverflowException
堆栈溢出...两个版本我的机器里有 F# 的... 出于好奇,我然后取出生成的二进制文件(ant.exe)并在 Arch Linux/Mono 下运行它:
$ mono -V | head -1
Mono JIT compiler version 2.10.5 (tarball Fri Sep 9 06:34:36 UTC 2011)
$ time mono ./ant.exe
Points: 148848
real 1m24.298s
user 0m0.567s
sys 0m0.027s
令人惊讶的是,它在 Mono 2.10.5 下运行(即没有堆栈溢出) - 但需要 84 秒,即慢了 587 倍比 OCaml - 哎呀。
所以这个程序......
- 在 OCaml 下运行良好,
- 在 .NET/F# 下根本不起作用
- ,但在 Mono/F# 下非常慢。
为什么?
编辑:奇怪的现象仍在继续 - 使用“--optimize+ --checked-”使问题消失,但仅在 ArchLinux/Mono 下;在Windows XP和Windows 7/64位下,即使是优化版本的二进制堆栈也会溢出。
最终编辑:我自己找到了答案 - 见下文。
I recently found a presentation about F# for Python programmers, and after watching it, I decided to implement a solution to the "ant puzzle" on my own.
There is an ant that can walk around on a planar grid. The ant can move one space at a time left, right, up or down. That is, from the cell (x, y) the ant can go to cells (x+1, y), (x-1, y), (x, y+1), and (x, y-1). Points where the sum of the digits of the x and y coordinates are greater than 25 are inaccessible to the ant. For example, the point (59,79) is inaccessible because 5 + 9 + 7 + 9 = 30, which is greater than 25. The question is: How many points can the ant access if it starts at (1000, 1000), including (1000, 1000) itself?
I implemented my solution in 30 lines of OCaml first, and tried it out:
$ ocamlopt -unsafe -rectypes -inline 1000 -o puzzle ant.ml
$ time ./puzzle
Points: 148848
real 0m0.143s
user 0m0.127s
sys 0m0.013s
Neat, my result is the same as that of leonardo's implementation, in D and C++. Comparing to Leonardo's C++ implementation, the OCaml version runs approx 2 times slower than C++. Which is OK, given that Leonardo used a queue to remove recursion.
I then translated the code to F# ... and here's what I got:
Thanassis@HOME /g/Tmp/ant.fsharp
$ /g/Program\ Files/FSharp-2.0.0.0/bin/fsc.exe ant.fs
Microsoft (R) F# 2.0 Compiler build 2.0.0.0
Copyright (c) Microsoft Corporation. All Rights Reserved.
Thanassis@HOME /g/Tmp/ant.fsharp
$ ./ant.exe
Process is terminated due to StackOverflowException.
Quit
Thanassis@HOME /g/Tmp/ant.fsharp
$ /g/Program\ Files/Microsoft\ F#/v4.0/Fsc.exe ant.fs
Microsoft (R) F# 2.0 Compiler build 4.0.30319.1
Copyright (c) Microsoft Corporation. All Rights Reserved.
Thanassis@HOME /g/Tmp/ant.fsharp
$ ./ant.exe
Process is terminated due to StackOverflowException
Stack overflow... with both versions of F# I have in my machine...
Out of curiosity, I then took the generated binary (ant.exe) and run it under Arch Linux/Mono:
$ mono -V | head -1
Mono JIT compiler version 2.10.5 (tarball Fri Sep 9 06:34:36 UTC 2011)
$ time mono ./ant.exe
Points: 148848
real 1m24.298s
user 0m0.567s
sys 0m0.027s
Surprisingly, it runs under Mono 2.10.5 (i.e. no stack overflow) - but it takes 84 seconds, i.e. 587 times slower than OCaml - oops.
So this program...
- runs fine under OCaml
- doesn't work at all under .NET/F#
- works, but is very slow, under Mono/F#.
Why?
EDIT: Weirdness continues - Using "--optimize+ --checked-" makes the problem disappear, but only under ArchLinux/Mono ; under Windows XP and Windows 7/64bit, even the optimized version of the binary stack overflows.
Final EDIT: I found out the answer myself - see below.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
执行摘要:
然后是时候移植到 F# 了。
然后我发布到 Stack Overflow - 但有些人决定关闭这个问题(叹气)。
是时候检查堆栈大小了:在 Windows 下,另一篇 SO 文章指出,它默认设置为1MB。在 Linux 下,“uname -s”和 测试程序的编译清楚地表明它是8MB。
这解释了为什么该程序在 Linux 下运行而不是在 Windows 下运行(该程序使用了超过 1MB 的堆栈)。它没有解释为什么优化版本在 Mono 下比非优化版本运行得更好:0.5 秒 vs 84 秒(尽管 --optimize+ 似乎是默认设置的,请参阅 Keith 的评论“Expert F#”)提炼)。可能与 Mono 的垃圾收集器有关,它在第一个版本中不知何故被推向了极端。
Linux/OCaml 和 Linux/Mono/F# 执行时间之间的差异(0.14 vs 0.5)是因为我测量它的简单方法:“time ./binary ...”也测量启动时间,这对于 Mono 来说很重要/.NET(嗯,对于这个简单的小问题很重要)。
无论如何,为了一劳永逸地解决这个问题,我 编写了一个尾递归版本 - 最后的递归调用函数的值被转换为循环(因此,不需要使用堆栈 - 至少在理论上)。
新版本在Windows下也运行良好,并在0.5秒内完成。
所以,这个故事的寓意是:
PS Jon Harrop 博士的一些额外意见:
...您很幸运 OCaml 也没有溢出。
您已经发现实际堆栈大小因平台而异。
同一问题的另一个方面是不同的语言实现
以不同的速率占用堆栈空间并具有不同的性能
存在深堆栈时的特征。 OCaml、Mono 和 .NET
都使用不同的数据表示和 GC 算法,从而影响
这些结果... (a) OCaml 使用标记整数来区分指针,
给出紧凑的堆栈帧,并将遍历堆栈上的所有内容
寻找指点。标签本质上传达了足够的信息
让 OCaml 运行时能够遍历堆 (b) Mono 处理单词
在堆栈上保守地作为指针:如果作为指针,一个单词将指向
到堆分配的块中,则该块被认为是可访问的。
(c) 我不知道 .NET 的算法,但如果它吃掉堆栈我不会感到惊讶
space 更快,并且仍然遍历堆栈上的每个单词(当然
如果不相关的线程有一个
深栈!)...此外,您使用堆分配的元组意味着您将
快速填充苗圃一代(例如 gen0),因此,
导致 GC 经常遍历这些深栈...
Executive summary:
It was then time to port to F#.
I then posted to Stack Overflow - but some people decided to close the question (sigh).
It was time to check the stack size: Under Windows, another SO post pointed out that it is set by default to 1MB. Under Linux, "uname -s" and a compilation of a test program clearly showed that it is 8MB.
This explained why the program worked under Linux and not under Windows (the program used more than 1MB of stack). It didn't explain why the optimized version run so much better under Mono than the non-optimized one: 0.5 seconds vs 84 seconds (even though the --optimize+ appears to be set by default, see comment by Keith with "Expert F#" extract). Probably has to do with the garbage collector of Mono, which was somehow driven to extremes by the 1st version.
The difference between Linux/OCaml and Linux/Mono/F# execution times (0.14 vs 0.5) is because of the simple way I measured it: "time ./binary ..." measures the startup time as well, which is significant for Mono/.NET (well, significant for this simple little problem).
Anyway, to solve this once and for all, I wrote a tail-recursive version - where the recursive call at the end of the function is transformed into a loop (and hence, no stack usage is necessary - at least in theory).
The new version run fine under Windows as well, and finished in 0.5 seconds.
So, moral of the story:
P.S. Some additional input from Dr. Jon Harrop:
...you were just lucky that OCaml didn't overflow as well.
You already identified that actual stack sizes vary between platforms.
Another facet of the same issue is that different language implementations
eat stack space at different rates and have different performance
characteristics in the presence of deep stacks. OCaml, Mono and .NET
all use different data representations and GC algorithms that impact
these results... (a) OCaml uses tagged integers to distinguish pointers,
giving compact stack frames, and will traverse everything on the stack
looking for pointers. The tagging essentially conveys just enough information
for the OCaml run time to be able to traverse the heap (b) Mono treats words
on the stack conservatively as pointers: if, as a pointer, a word would point
into a heap-allocated block then that block is considered to be reachable.
(c) I do not know .NET's algorithm but I wouldn't be surprised if it ate stack
space faster and still traversed every word on the stack (it certainly
suffers pathological performance from the GC if an unrelated thread has a
deep stack!)... Moreover, your use of heap-allocated tuples means you'll
be filling the nursery generation (e.g. gen0) quickly and, therefore,
causing the GC to traverse those deep stacks often...
让我尝试总结一下答案。
有 3 点需要说明:
有效 堆栈溢出异常是递归 vall 的结果,这是很常见的。如果调用位于尾部位置,编译器可以识别它并应用尾部调用优化,因此递归调用将不会占用堆栈空间。
尾部调用优化可能发生在 F#、CRL 或两者中:
CLR 尾部优化1
F# 递归(更通用)2
F# 尾部调用 3
“在 Windows 上失败,在 Linux 上失败”的正确解释是,正如其他人所说,两个操作系统上默认保留的堆栈空间。或者更好的是,两个操作系统下的编译器使用的保留堆栈空间。默认情况下,VC++仅保留1MB的堆栈空间。 CLR(可能)是用 VC++ 编译的,因此它有这个限制。保留的堆栈空间可以在编译时增加,但我不确定是否可以在编译的可执行文件上修改它。
编辑:事实证明这是可以完成的(请参阅此博客文章 http://www.bluebytesoftware.com/blog/2006/07/04/ModifyingStackReserveAndCommitSizesOnExistingBinaries.aspx)
我不会推荐它,但在极端情况下至少是可能的。
OCaml 版本可能可以工作,因为它是在 Linux 下运行的。
然而,在 Windows 下测试 OCaml 版本也会很有趣。我知道 OCaml 编译器在尾部调用优化方面比 F# 更积极。它甚至可以从原始代码中提取尾部递归函数吗?
我对“--optimize+”的猜测是,它仍然会导致代码重复出现,因此它在 Windows 下仍然会失败,但会通过使可执行文件运行得更快来缓解问题。
最后,最终的解决方案是使用尾递归(通过重写代码或依靠积极的编译器优化);这是避免递归函数堆栈溢出问题的好方法。
Let me try to summarize the answer.
There are 3 points to be made:
It is very common that a Stack Overflow exception is the result of a recursive vall. If the call is in tail position, the compiler may recognize it and apply tail call optimization, therefore the recursive call(s) will not take up stack space.
Tail call optimization may happen in F#, in the CRL, or in both:
CLR tail optimization1
F# recursion (more general) 2
F# tail calls 3
The correct explanation for "fails on windows, not in linux" is, as other said, the default reserved stack space on the two OS. Or better, the reserved stack space used by the compilers under the two OSes. By default, VC++ reserves only 1MB of stack space. The CLR is (likely) compiled with VC++, so it has this limitation. Reserved stack space can be increased at compile time, but I'm not sure if it can be modified on compiled executables.
EDIT: turns out that it can be done (see this blog post http://www.bluebytesoftware.com/blog/2006/07/04/ModifyingStackReserveAndCommitSizesOnExistingBinaries.aspx)
I would not recommend it, but in extreme situations at least it is possible.
OCaml version may work because it was run under Linux.
However, it would be interesting to test also the OCaml version under Windows. I know that the OCaml compiler is more aggressive at tail-call optimization than F#.. could it even extract a tail recursive function from your original code?
My guess about "--optimize+" is that it will still cause the code to recur, hence it will still fail under Windows, but will mitigate the problem by making the executable run faster.
Finally, the definitive solution is to use tail recursion (by rewriting the code or by relying on aggressive compiler optimization); it is a good way to avoid stack overflow problem with recursive functions.