Haskell 堆栈溢出

发布于 2024-11-06 15:35:46 字数 1092 浏览 1 评论 0原文

我正在编写一个遗传算法来生成字符串“helloworld”。但当 n 为 10,000 或更大时,evolve 函数会产生堆栈溢出。

module Genetics where

import Data.List (sortBy)
import Random (randomRIO)
import Control.Monad (foldM)

class Gene g where
    -- How ideal is the gene from 0.0 to 1.0?
    fitness :: g -> Float

    -- How does a gene mutate?
    mutate :: g -> IO g

    -- How many species will be explored?
    species :: [g] -> Int

orderFitness :: (Gene g) => [g] -> [g]
orderFitness = reverse . sortBy (\a b -> compare (fitness a) (fitness b))

compete :: (Gene g) => [g] -> IO [g]
compete pool = do
    let s = species pool
    variants <- (mapM (mapM mutate) . map (replicate s)) pool
    let pool' = (map head . map orderFitness) variants
    return pool'

evolve :: (Gene g) => Int -> [g] -> IO [g]
evolve 0 pool = return pool
evolve n pool = do
    pool' <- compete pool
    evolve (n - 1) pool'

species pool = 8时,8个基因的池复制到8个组。每个群体都会发生突变,并选择每个群体中最适应的进行进一步进化(回到 8 个基因)。

GitHub

I'm writing a genetic algorithm to generate the string "helloworld". But the evolve function creates a stack overflow when n is 10,000 or more.

module Genetics where

import Data.List (sortBy)
import Random (randomRIO)
import Control.Monad (foldM)

class Gene g where
    -- How ideal is the gene from 0.0 to 1.0?
    fitness :: g -> Float

    -- How does a gene mutate?
    mutate :: g -> IO g

    -- How many species will be explored?
    species :: [g] -> Int

orderFitness :: (Gene g) => [g] -> [g]
orderFitness = reverse . sortBy (\a b -> compare (fitness a) (fitness b))

compete :: (Gene g) => [g] -> IO [g]
compete pool = do
    let s = species pool
    variants <- (mapM (mapM mutate) . map (replicate s)) pool
    let pool' = (map head . map orderFitness) variants
    return pool'

evolve :: (Gene g) => Int -> [g] -> IO [g]
evolve 0 pool = return pool
evolve n pool = do
    pool' <- compete pool
    evolve (n - 1) pool'

With species pool = 8, a pool of 8 genes replicates to 8 groups. Each group mutates, and the fittest of each group is selected for further evolution (back to 8 genes).

GitHub

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

鱼忆七猫命九 2024-11-13 15:35:46

如果您对性能感兴趣,我会使用快速随机数生成器,例如:

其次,compete 看起来非常可疑,因为它完全是懒惰的,尽管构建了一些潜在的大型结构。尝试使用 deepseq 锤子将其重写得更严格:

import Control.DeepSeq    

compete :: (Gene g, NFData g) => [g] -> IO [g]
compete pool = do
    let s = species pool
    variants <- (mapM (mapM mutate) . map (replicate s)) pool
    let pool' = (map head . map orderFitness) variants
    pool' `deepseq` return pool'

这些东西都不需要不过,在 IO 中(单独的问题)。像 Rand monad 这样的东西可能更合适

If you're interested in performance, I'd use a fast random number generator, such as:

Secondly, compete looks very suspicious, as it is entirely lazy, despite building some potentially large structures. Try rewriting it to be a bit stricter, using the deepseq hammer:

import Control.DeepSeq    

compete :: (Gene g, NFData g) => [g] -> IO [g]
compete pool = do
    let s = species pool
    variants <- (mapM (mapM mutate) . map (replicate s)) pool
    let pool' = (map head . map orderFitness) variants
    pool' `deepseq` return pool'

None of this stuff needs to be in IO, though, (separate issue). Something like the Rand monad may be more appropriate.

涫野音 2024-11-13 15:35:46

感谢 Don 的 deepseq 建议,我能够将问题范围缩小到 mapM mutate,这导致了太多的重击。新版本有 mutate',它使用 seq 来防止 thunking。

module Genetics where

import Data.List (maximumBy)
import Random (randomRIO)

class Gene g where
    -- How ideal is the gene from 0.0 to 1.0?
    fitness :: g -> Float

    -- How does a gene mutate?
    mutate :: g -> IO g

    -- How many species will be explored in each round?
    species :: [g] -> Int

best :: (Gene g) => [g] -> g
best = maximumBy (\a b -> compare (fitness a) (fitness b))

-- Prevents stack overflow
mutate' :: (Gene g) => g -> IO g
mutate' gene = do
    gene' <- mutate gene
    gene' `seq` return gene'

drift :: (Gene g) => [[g]] -> IO [[g]]
drift = mapM (mapM mutate')

compete :: (Gene g) => [g] -> IO [g]
compete pool = do
    let islands = map (replicate (species pool)) pool
    islands' <- drift islands
    let representatives = map best islands'
    return representatives

evolve :: (Gene g) => Int -> [g] -> IO [g]
evolve 0 pool = return pool
evolve n pool = compete pool >>= evolve (n - 1)

GitHub

Thanks to Don's deepseq suggestion, I was able to narrow the problem to mapM mutate which made too many thunks. The new version has mutate', which uses seq to prevent thunking.

module Genetics where

import Data.List (maximumBy)
import Random (randomRIO)

class Gene g where
    -- How ideal is the gene from 0.0 to 1.0?
    fitness :: g -> Float

    -- How does a gene mutate?
    mutate :: g -> IO g

    -- How many species will be explored in each round?
    species :: [g] -> Int

best :: (Gene g) => [g] -> g
best = maximumBy (\a b -> compare (fitness a) (fitness b))

-- Prevents stack overflow
mutate' :: (Gene g) => g -> IO g
mutate' gene = do
    gene' <- mutate gene
    gene' `seq` return gene'

drift :: (Gene g) => [[g]] -> IO [[g]]
drift = mapM (mapM mutate')

compete :: (Gene g) => [g] -> IO [g]
compete pool = do
    let islands = map (replicate (species pool)) pool
    islands' <- drift islands
    let representatives = map best islands'
    return representatives

evolve :: (Gene g) => Int -> [g] -> IO [g]
evolve 0 pool = return pool
evolve n pool = compete pool >>= evolve (n - 1)

GitHub

看透却不说透 2024-11-13 15:35:46

您可以使用 maximumBy 和一个 (map head .map orderFitness)(其中 orderFitness 是一个 sortBy),而不是使用 (map head .map orderFitness) 地图。这并不会节省太多(因为您将从 O(n log n) 变为 O(n),并且可能会通过消除双映射而获得另一个两倍),但至少更简单、更高效。您还可以摆脱对反向的调用。

我怀疑这可以在没有 deepseq 的情况下解决问题,但它仍然应该是一个改进。

编辑:如果标准库和 GHC 是完美的,那么 head 。 sortBy 将生成与 maximumBy 和 map head 相同的代码。 map sortBy 会生成与 map (head .sortBy) 相同的代码,遗憾的是,这些事情在实践中都不太可能是真的。 sortBy 往往会进行大量额外的内存分配,因为它是一种分而治之的算法。组合地图是一种有时会得到的优化,但不应指望。

更重要的是,使用 maximumBy 更具声明性。更容易看出代码的作用以及需要多长时间。在优化中也应该更容易利用,因为我们知道目标是什么,而不仅仅是如何实现它。

Instead of using (map head . map orderFitness) where orderFitness is a sortBy you could use maximumBy and a single map. That doesn't save too much (since you are going from an O(n log n) to O(n) and might be getting another factor of two from eliminating the double map), but is at the least somewhat simpler and more efficient. You would also get rid of a call to reverse.

I doubt this fixes the problem without a deepseq, but it should be an improvement none the less.

Edit: if the standard library and GHC were perfect, then head . sortBy would produce identical code to maximumBy and map head . map sortBy would produce identical code to map (head . sortBy) sadly neither of these things is likely to be true in practice. sortBy will tend to do a bunch of extra memory allocation because it is a divide and conquer algorithm. Combining maps is an optimization you sometimes get, but should not count on.

More importantly, using maximumBy is more declarative. It is easier to see what the code does and how long it will take. It should also be easier to take advantage of in optimizations, because we know what the goal is, not just how we get it.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文