Haskell 堆栈溢出
我正在编写一个遗传算法来生成字符串“helloworld”。但当 n 为 10,000 或更大时,evolve 函数会产生堆栈溢出。
module Genetics where
import Data.List (sortBy)
import Random (randomRIO)
import Control.Monad (foldM)
class Gene g where
-- How ideal is the gene from 0.0 to 1.0?
fitness :: g -> Float
-- How does a gene mutate?
mutate :: g -> IO g
-- How many species will be explored?
species :: [g] -> Int
orderFitness :: (Gene g) => [g] -> [g]
orderFitness = reverse . sortBy (\a b -> compare (fitness a) (fitness b))
compete :: (Gene g) => [g] -> IO [g]
compete pool = do
let s = species pool
variants <- (mapM (mapM mutate) . map (replicate s)) pool
let pool' = (map head . map orderFitness) variants
return pool'
evolve :: (Gene g) => Int -> [g] -> IO [g]
evolve 0 pool = return pool
evolve n pool = do
pool' <- compete pool
evolve (n - 1) pool'
当species pool = 8
时,8个基因的池复制到8个组。每个群体都会发生突变,并选择每个群体中最适应的进行进一步进化(回到 8 个基因)。
I'm writing a genetic algorithm to generate the string "helloworld". But the evolve function creates a stack overflow when n is 10,000 or more.
module Genetics where
import Data.List (sortBy)
import Random (randomRIO)
import Control.Monad (foldM)
class Gene g where
-- How ideal is the gene from 0.0 to 1.0?
fitness :: g -> Float
-- How does a gene mutate?
mutate :: g -> IO g
-- How many species will be explored?
species :: [g] -> Int
orderFitness :: (Gene g) => [g] -> [g]
orderFitness = reverse . sortBy (\a b -> compare (fitness a) (fitness b))
compete :: (Gene g) => [g] -> IO [g]
compete pool = do
let s = species pool
variants <- (mapM (mapM mutate) . map (replicate s)) pool
let pool' = (map head . map orderFitness) variants
return pool'
evolve :: (Gene g) => Int -> [g] -> IO [g]
evolve 0 pool = return pool
evolve n pool = do
pool' <- compete pool
evolve (n - 1) pool'
With species pool = 8
, a pool of 8 genes replicates to 8 groups. Each group mutates, and the fittest of each group is selected for further evolution (back to 8 genes).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果您对性能感兴趣,我会使用快速随机数生成器,例如:
其次,
compete
看起来非常可疑,因为它完全是懒惰的,尽管构建了一些潜在的大型结构。尝试使用 deepseq 锤子将其重写得更严格:这些东西都不需要不过,在 IO 中(单独的问题)。像
Rand
monad 这样的东西可能更合适。If you're interested in performance, I'd use a fast random number generator, such as:
Secondly,
compete
looks very suspicious, as it is entirely lazy, despite building some potentially large structures. Try rewriting it to be a bit stricter, using the deepseq hammer:None of this stuff needs to be in IO, though, (separate issue). Something like the
Rand
monad may be more appropriate.感谢 Don 的
deepseq
建议,我能够将问题范围缩小到mapM mutate
,这导致了太多的重击。新版本有mutate'
,它使用seq
来防止 thunking。GitHub
Thanks to Don's
deepseq
suggestion, I was able to narrow the problem tomapM mutate
which made too many thunks. The new version hasmutate'
, which usesseq
to prevent thunking.GitHub
您可以使用
maximumBy
和一个(map head .map orderFitness)
(其中orderFitness
是一个sortBy
),而不是使用(map head .map orderFitness)
地图
。这并不会节省太多(因为您将从 O(n log n) 变为 O(n),并且可能会通过消除双映射而获得另一个两倍),但至少更简单、更高效。您还可以摆脱对反向的调用。我怀疑这可以在没有
deepseq
的情况下解决问题,但它仍然应该是一个改进。编辑:如果标准库和 GHC 是完美的,那么
head 。 sortBy
将生成与 maximumBy 和map head 相同的代码。 map sortBy
会生成与map (head .sortBy)
相同的代码,遗憾的是,这些事情在实践中都不太可能是真的。sortBy
往往会进行大量额外的内存分配,因为它是一种分而治之的算法。组合地图是一种有时会得到的优化,但不应指望。更重要的是,使用 maximumBy 更具声明性。更容易看出代码的作用以及需要多长时间。在优化中也应该更容易利用,因为我们知道目标是什么,而不仅仅是如何实现它。
Instead of using
(map head . map orderFitness)
whereorderFitness
is asortBy
you could usemaximumBy
and a singlemap
. That doesn't save too much (since you are going from an O(n log n) to O(n) and might be getting another factor of two from eliminating the double map), but is at the least somewhat simpler and more efficient. You would also get rid of a call to reverse.I doubt this fixes the problem without a
deepseq
, but it should be an improvement none the less.Edit: if the standard library and GHC were perfect, then
head . sortBy
would produce identical code tomaximumBy
andmap head . map sortBy
would produce identical code tomap (head . sortBy)
sadly neither of these things is likely to be true in practice.sortBy
will tend to do a bunch of extra memory allocation because it is a divide and conquer algorithm. Combining maps is an optimization you sometimes get, but should not count on.More importantly, using
maximumBy
is more declarative. It is easier to see what the code does and how long it will take. It should also be easier to take advantage of in optimizations, because we know what the goal is, not just how we get it.