在 ocaml 中生成大字母序列时堆栈溢出
给定一个字母表 ["a"; “b”; "c"]
我想将所有长度为 25 的序列转储到文件中。 (字母可以按顺序重复;这不是排列。)问题是,当我尝试使用以下代码时,我在求值期间出现堆栈溢出(循环递归?):
let addAlphabetToPrefix alphabet prefix =
List.map (function letter -> (prefix ^ letter)) alphabet;;
let rec generateWords alphabet counter words =
if counter > 25 then
words
else
let newWords = List.flatten(List.map (function word -> addAlphabetToPrefix alphabet word) words) in
generateWords alphabet (counter + 1) newWords;;
generateWords ["a"; "b"; "c"] 0 [""];; (* Produces a stack overflow. *)
有没有更好的方法这样做?我正在考虑先生成整个列表,然后将整个列表转储到文件中,但是我是否必须重复生成部分列表然后转储?做一些懒惰的事情会有帮助吗?
到底为什么会发生堆栈溢出? AFAICT,我的 generateWords
函数是尾递归的。我生成的 words
列表是否太大而无法装入内存?
Given an alphabet ["a"; "b"; "c"]
I want to dump all sequences of length 25 to a file. (Letters can repeat in a sequence; it's not a permutation.) The problem is, I get a Stack overflow during evaluation (looping recursion?)
when I try using the following code:
let addAlphabetToPrefix alphabet prefix =
List.map (function letter -> (prefix ^ letter)) alphabet;;
let rec generateWords alphabet counter words =
if counter > 25 then
words
else
let newWords = List.flatten(List.map (function word -> addAlphabetToPrefix alphabet word) words) in
generateWords alphabet (counter + 1) newWords;;
generateWords ["a"; "b"; "c"] 0 [""];; (* Produces a stack overflow. *)
Is there a better way of doing this? I was thinking of generating the entire list first, and then dumping the entire list to a file, but do I have to repeatedly generate partials lists and then dump? Would making something lazy help?
Why exactly is a stack overflow occurring? AFAICT, my generateWords
function is tail-recursive. Is the problem that the words
list I'm generating is getting too big to fit into memory?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您的函数正在被编译为尾部调用。我从线性化代码中确认了;从本机编译器
ocamlopt[.opt]
中的-dlinear
选项获取。事实上,你的堆正在呈指数级增长,而 25 个单词在这种方法中是不可持续的。尝试使用 11 效果很好(并且是我可以处理的最高值)。
是的,有更好的方法可以做到这一点。您可以通过查找 按字典顺序排列的组合索引或使用格雷码(同一页)。这些只需要存储一个单词,可以并行运行,并且永远不会导致分段错误——尽管使用索引方法可能会溢出,在这种情况下,您可以切换到大整数,但会牺牲速度,或者格雷码(可能难以并行化,具体取决于格雷码)。
Your functions are being compiled as tailcalls. I confirmed from the linearized code; obtained from the
-dlinear
option in the native compiler,ocamlopt[.opt]
.The fact of the matter is, your heap is growing exponentially, and 25 words is unsustainable in this method. Trying with 11 works fine (and is the highest I could deal with).
Yes, there is a better way to do this. You can generate the combinations by looking up the index of the combination in lexicographical order or using grey codes (same page). These would only require storage for one word, can be run in parallel, and will never cause a segmentation fault --you might overflow the using the index method though, in which case you can switch to the big integers but will sacrifice speed, or grey codes (which may be difficult to parallelize, depending on the grey code).
OCaml 优化尾递归,因此您的代码应该可以工作,但不幸的是,标准库的
List.map
函数不是尾递归的。当您的列表变得相当大时,堆栈溢出可能会发生在这些调用之一中。Batteries Included 和 Jane Street 的核心库都提供
map
的尾递归版本。尝试其中之一,看看是否可以解决问题。OCaml optimizes tail recursion, so your code should work, except: the standard library's
List.map
function is, unfortunately, not tail-recursive. The stack overflow is potentially occurring in one of those calls, as your lists get rather large.Batteries Included and Jane Street's Core library both provide tail-recursive versions of
map
. Try one of those and see if it fixes the problem.