F# 与 OCaml：堆栈溢出

发布于 2024-12-06 02:06:17 字数 2280 浏览 2 评论 0原文

我最近发现了一个关于F# for Python程序员的演示，看完之后，我决定实现我自己解决了“蚂蚁难题”。

有一只蚂蚁可以在平面网格上走动。蚂蚁一次可以向左、向右、向上或向下移动一格。也就是说，蚂蚁可以从单元格 (x, y) 前往单元格 (x+1, y)、(x-1, y)、(x, y+1) 和 (x, y-1)。 x 和 y 坐标的数字之和大于 25 的点是蚂蚁无法到达的。例如，点(59,79)是不可访问的，因为5 + 9 + 7 + 9 = 30，大于25。问题是：如果从(1000, 1000)开始，蚂蚁可以访问多少个点，包括 (1000, 1000) 本身？

我用 30 行 OCaml 首先，并尝试了一下：

$ ocamlopt -unsafe -rectypes -inline 1000 -o puzzle ant.ml
$ time ./puzzle
Points: 148848

real    0m0.143s
user    0m0.127s
sys     0m0.013s

Neat，我的结果与 leonardo 的 D 和 C++ 实现。与 Leonardo 的 C++ 实现相比，OCaml 版本的运行速度大约比 C++ 慢 2 倍。鉴于 Leonardo 使用队列来删除递归，这没关系。

然后我将代码翻译为 F# ...这就是我得到的：

Thanassis@HOME /g/Tmp/ant.fsharp
$ /g/Program\ Files/FSharp-2.0.0.0/bin/fsc.exe ant.fs
Microsoft (R) F# 2.0 Compiler build 2.0.0.0
Copyright (c) Microsoft Corporation. All Rights Reserved.

Thanassis@HOME /g/Tmp/ant.fsharp
$ ./ant.exe

Process is terminated due to StackOverflowException.
Quit

Thanassis@HOME /g/Tmp/ant.fsharp
$ /g/Program\ Files/Microsoft\ F#/v4.0/Fsc.exe ant.fs
Microsoft (R) F# 2.0 Compiler build 4.0.30319.1
Copyright (c) Microsoft Corporation. All Rights Reserved.

Thanassis@HOME /g/Tmp/ant.fsharp
$ ./ant.exe

Process is terminated due to StackOverflowException

堆栈溢出...两个版本我的机器里有 F# 的... 出于好奇，我然后取出生成的二进制文件（ant.exe）并在 Arch Linux/Mono 下运行它：

$ mono -V | head -1
Mono JIT compiler version 2.10.5 (tarball Fri Sep  9 06:34:36 UTC 2011)

$ time mono ./ant.exe
Points: 148848

real    1m24.298s
user    0m0.567s
sys     0m0.027s

令人惊讶的是，它在 Mono 2.10.5 下运行（即没有堆栈溢出） - 但需要 84 秒，即慢了 587 倍比 OCaml - 哎呀。

所以这个程序......

在 OCaml 下运行良好，
在 .NET/F# 下根本不起作用
，但在 Mono/F# 下非常慢。

为什么？

编辑：奇怪的现象仍在继续 - 使用“--optimize+ --checked-”使问题消失，但仅在 ArchLinux/Mono 下；在Windows XP和Windows 7/64位下，即使是优化版本的二进制堆栈也会溢出。

最终编辑：我自己找到了答案 - 见下文。

原文

I recently found a presentation about F# for Python programmers, and after watching it, I decided to implement a solution to the "ant puzzle" on my own.

There is an ant that can walk around on a planar grid. The ant can move one space at a time left, right, up or down. That is, from the cell (x, y) the ant can go to cells (x+1, y), (x-1, y), (x, y+1), and (x, y-1). Points where the sum of the digits of the x and y coordinates are greater than 25 are inaccessible to the ant. For example, the point (59,79) is inaccessible because 5 + 9 + 7 + 9 = 30, which is greater than 25. The question is: How many points can the ant access if it starts at (1000, 1000), including (1000, 1000) itself?

I implemented my solution in 30 lines of OCaml first, and tried it out:

$ ocamlopt -unsafe -rectypes -inline 1000 -o puzzle ant.ml
$ time ./puzzle
Points: 148848

real    0m0.143s
user    0m0.127s
sys     0m0.013s

Neat, my result is the same as that of leonardo's implementation, in D and C++. Comparing to Leonardo's C++ implementation, the OCaml version runs approx 2 times slower than C++. Which is OK, given that Leonardo used a queue to remove recursion.

I then translated the code to F# ... and here's what I got:

Thanassis@HOME /g/Tmp/ant.fsharp
$ /g/Program\ Files/FSharp-2.0.0.0/bin/fsc.exe ant.fs
Microsoft (R) F# 2.0 Compiler build 2.0.0.0
Copyright (c) Microsoft Corporation. All Rights Reserved.

Thanassis@HOME /g/Tmp/ant.fsharp
$ ./ant.exe

Process is terminated due to StackOverflowException.
Quit

Thanassis@HOME /g/Tmp/ant.fsharp
$ /g/Program\ Files/Microsoft\ F#/v4.0/Fsc.exe ant.fs
Microsoft (R) F# 2.0 Compiler build 4.0.30319.1
Copyright (c) Microsoft Corporation. All Rights Reserved.

Thanassis@HOME /g/Tmp/ant.fsharp
$ ./ant.exe

Process is terminated due to StackOverflowException

Stack overflow... with both versions of F# I have in my machine...
Out of curiosity, I then took the generated binary (ant.exe) and run it under Arch Linux/Mono:

$ mono -V | head -1
Mono JIT compiler version 2.10.5 (tarball Fri Sep  9 06:34:36 UTC 2011)

$ time mono ./ant.exe
Points: 148848

real    1m24.298s
user    0m0.567s
sys     0m0.027s

Surprisingly, it runs under Mono 2.10.5 (i.e. no stack overflow) - but it takes 84 seconds, i.e. 587 times slower than OCaml - oops.

So this program...

runs fine under OCaml
doesn't work at all under .NET/F#
works, but is very slow, under Mono/F#.

Why?

EDIT: Weirdness continues - Using "--optimize+ --checked-" makes the problem disappear, but only under ArchLinux/Mono ; under Windows XP and Windows 7/64bit, even the optimized version of the binary stack overflows.

Final EDIT: I found out the answer myself - see below.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

简单气质女生网名 2024-12-13 02:06:18

执行摘要：

我编写了一个算法的简单实现......它不是尾递归的。
我在Linux下用OCaml编译了它。
效果很好，0.14秒就完成了。

然后是时候移植到 F# 了。

我将代码（直接翻译）翻译为F#。
我在 Windows 下编译并运行它 - 我遇到了堆栈溢出。
我在 Linux 下获取了二进制文件，并在 Mono 下运行它。
它有效，但运行速度非常慢（84 秒）。

然后我发布到 Stack Overflow - 但有些人决定关闭这个问题（叹气）。

我尝试使用 --optimize+ --checked- 进行编译
二进制文件在 Windows 下仍然堆栈溢出...
...但在 Linux/Mono 下运行良好（并在 0.5 秒内完成）。

是时候检查堆栈大小了：在 Windows 下，另一篇 SO 文章指出，它默认设置为1MB。在 Linux 下，“uname -s”和测试程序的编译清楚地表明它是8MB。

这解释了为什么该程序在 Linux 下运行而不是在 Windows 下运行（该程序使用了超过 1MB 的堆栈）。它没有解释为什么优化版本在 Mono 下比非优化版本运行得更好：0.5 秒 vs 84 秒（尽管 --optimize+ 似乎是默认设置的，请参阅 Keith 的评论“Expert F#”）提炼）。可能与 Mono 的垃圾收集器有关，它在第一个版本中不知何故被推向了极端。

Linux/OCaml 和 Linux/Mono/F# 执行时间之间的差异（0.14 vs 0.5）是因为我测量它的简单方法：“time ./binary ...”也测量启动时间，这对于 Mono 来说很重要/.NET（嗯，对于这个简单的小问题很重要）。

无论如何，为了一劳永逸地解决这个问题，我编写了一个尾递归版本 - 最后的递归调用函数的值被转换为循环（因此，不需要使用堆栈 - 至少在理论上）。

新版本在Windows下也运行良好，并在0.5秒内完成。

所以，这个故事的寓意是：

小心你的堆栈使用情况，特别是当你使用大量堆栈并在 Windows 下运行时。使用带有 /STACK 选项的 EDITBIN 来将二进制文件设置为更大的堆栈大小，或者更好的是，以不依赖于使用太多堆栈的方式编写代码。
OCaml 在尾递归消除方面可能比 F# 更好 - 或者它的垃圾收集器在这个特定问题上做得更好。
不要对...粗鲁的人关闭你的 Stack Overflow 问题感到绝望，好人最终会抵消它们 - 如果问题真的很好:-)

PS Jon Harrop 博士的一些额外意见：

...您很幸运 OCaml 也没有溢出。
您已经发现实际堆栈大小因平台而异。
同一问题的另一个方面是不同的语言实现
以不同的速率占用堆栈空间并具有不同的性能
存在深堆栈时的特征。 OCaml、Mono 和 .NET
都使用不同的数据表示和 GC 算法，从而影响
这些结果... (a) OCaml 使用标记整数来区分指针，
给出紧凑的堆栈帧，并将遍历堆栈上的所有内容
寻找指点。标签本质上传达了足够的信息
让 OCaml 运行时能够遍历堆 (b) Mono 处理单词
在堆栈上保守地作为指针：如果作为指针，一个单词将指向
到堆分配的块中，则该块被认为是可访问的。
(c) 我不知道 .NET 的算法，但如果它吃掉堆栈我不会感到惊讶
space 更快，并且仍然遍历堆栈上的每个单词（当然
如果不相关的线程有一个
深栈！）...此外，您使用堆分配的元组意味着您将
快速填充苗圃一代（例如 gen0），因此，
导致 GC 经常遍历这些深栈...

Executive summary:

I wrote a simple implementation of an algorithm... that wasn't tail-recursive.
I compiled it with OCaml under Linux.
It worked fine, and finished in 0.14 seconds.

It was then time to port to F#.

I translated the code (direct translation) to F#.
I compiled under Windows, and run it - I got a stack overflow.
I took the binary under Linux, and run it under Mono.
It worked, but run very slowly (84 seconds).

I then posted to Stack Overflow - but some people decided to close the question (sigh).

I tried compiling with --optimize+ --checked-
The binary still stack overflowed under Windows...
...but run fine (and finished in 0.5 seconds) under Linux/Mono.

It was time to check the stack size: Under Windows, another SO post pointed out that it is set by default to 1MB. Under Linux, "uname -s" and a compilation of a test program clearly showed that it is 8MB.

This explained why the program worked under Linux and not under Windows (the program used more than 1MB of stack). It didn't explain why the optimized version run so much better under Mono than the non-optimized one: 0.5 seconds vs 84 seconds (even though the --optimize+ appears to be set by default, see comment by Keith with "Expert F#" extract). Probably has to do with the garbage collector of Mono, which was somehow driven to extremes by the 1st version.

The difference between Linux/OCaml and Linux/Mono/F# execution times (0.14 vs 0.5) is because of the simple way I measured it: "time ./binary ..." measures the startup time as well, which is significant for Mono/.NET (well, significant for this simple little problem).

Anyway, to solve this once and for all, I wrote a tail-recursive version - where the recursive call at the end of the function is transformed into a loop (and hence, no stack usage is necessary - at least in theory).

The new version run fine under Windows as well, and finished in 0.5 seconds.

So, moral of the story:

Beware of your stack usage, especially if you use lots of it and run under Windows. Use EDITBIN with the /STACK option to set your binaries to larger stack sizes, or better yet, write your code in a manner that doesn't depend on using too much stack.
OCaml may be better at tail-recursion elimination than F# - or it's garbage collector is doing a better job at this particular problem.
Don't despair about ...rude people closing your Stack Overflow questions, good people will counteract them in the end - if the questions are really good :-)

P.S. Some additional input from Dr. Jon Harrop:

...you were just lucky that OCaml didn't overflow as well.
You already identified that actual stack sizes vary between platforms.
Another facet of the same issue is that different language implementations
eat stack space at different rates and have different performance
characteristics in the presence of deep stacks. OCaml, Mono and .NET
all use different data representations and GC algorithms that impact
these results... (a) OCaml uses tagged integers to distinguish pointers,
giving compact stack frames, and will traverse everything on the stack
looking for pointers. The tagging essentially conveys just enough information
for the OCaml run time to be able to traverse the heap (b) Mono treats words
on the stack conservatively as pointers: if, as a pointer, a word would point
into a heap-allocated block then that block is considered to be reachable.
(c) I do not know .NET's algorithm but I wouldn't be surprised if it ate stack
space faster and still traversed every word on the stack (it certainly
suffers pathological performance from the GC if an unrelated thread has a
deep stack!)... Moreover, your use of heap-allocated tuples means you'll
be filling the nursery generation (e.g. gen0) quickly and, therefore,
causing the GC to traverse those deep stacks often...

回复收藏 0 原文

蘸点软妹酱 2024-12-13 02:06:18

让我尝试总结一下答案。

有 3 点需要说明：

问题：堆栈溢出发生在递归函数上
它只发生在 Windows 下：在 Linux 上，对于检查的问题大小，它
在 OCaml 中工作相同（或相似）的代码
，针对问题优化 + 编译器标志检查大小，

有效堆栈溢出异常是递归 vall 的结果，这是很常见的。如果调用位于尾部位置，编译器可以识别它并应用尾部调用优化，因此递归调用将不会占用堆栈空间。
尾部调用优化可能发生在 F#、CRL 或两者中：

CLR 尾部优化1

F# 递归（更通用）2

F# 尾部调用 3

“在 Windows 上失败，在 Linux 上失败”的正确解释是，正如其他人所说，两个操作系统上默认保留的堆栈空间。或者更好的是，两个操作系统下的编译器使用的保留堆栈空间。默认情况下，VC++仅保留1MB的堆栈空间。 CLR（可能）是用 VC++ 编译的，因此它有这个限制。保留的堆栈空间可以在编译时增加，但我不确定是否可以在编译的可执行文件上修改它。

编辑：事实证明这是可以完成的（请参阅此博客文章 http://www.bluebytesoftware.com/blog/2006/07/04/ModifyingStackReserveAndCommitSizesOnExistingBinaries.aspx）
我不会推荐它，但在极端情况下至少是可能的。

OCaml 版本可能可以工作，因为它是在 Linux 下运行的。
然而，在 Windows 下测试 OCaml 版本也会很有趣。我知道 OCaml 编译器在尾部调用优化方面比 F# 更积极。它甚至可以从原始代码中提取尾部递归函数吗？

我对“--optimize+”的猜测是，它仍然会导致代码重复出现，因此它在 Windows 下仍然会失败，但会通过使可执行文件运行得更快来缓解问题。

最后，最终的解决方案是使用尾递归（通过重写代码或依靠积极的编译器优化）；这是避免递归函数堆栈溢出问题的好方法。

Let me try to summarize the answer.

There are 3 points to be made:

problem: stack overflow happens on a recursive function
it happens only under windows: on linux, for the problem size examined, it works
same (or similar) code in OCaml works
optimize+ compiler flag, for the problem size examined, works

It is very common that a Stack Overflow exception is the result of a recursive vall. If the call is in tail position, the compiler may recognize it and apply tail call optimization, therefore the recursive call(s) will not take up stack space.
Tail call optimization may happen in F#, in the CRL, or in both:

CLR tail optimization1

F# recursion (more general) 2

F# tail calls 3

The correct explanation for "fails on windows, not in linux" is, as other said, the default reserved stack space on the two OS. Or better, the reserved stack space used by the compilers under the two OSes. By default, VC++ reserves only 1MB of stack space. The CLR is (likely) compiled with VC++, so it has this limitation. Reserved stack space can be increased at compile time, but I'm not sure if it can be modified on compiled executables.

EDIT: turns out that it can be done (see this blog post http://www.bluebytesoftware.com/blog/2006/07/04/ModifyingStackReserveAndCommitSizesOnExistingBinaries.aspx)
I would not recommend it, but in extreme situations at least it is possible.

OCaml version may work because it was run under Linux.
However, it would be interesting to test also the OCaml version under Windows. I know that the OCaml compiler is more aggressive at tail-call optimization than F#.. could it even extract a tail recursive function from your original code?

My guess about "--optimize+" is that it will still cause the code to recur, hence it will still fail under Windows, but will mitigate the problem by making the executable run faster.

Finally, the definitive solution is to use tail recursion (by rewriting the code or by relying on aggressive compiler optimization); it is a good way to avoid stack overflow problem with recursive functions.

回复收藏 0 原文

~没有更多了~