Haskell 堆栈溢出

发布于 2024-11-06 15:35:46 字数 1092 浏览 7 评论 0原文

我正在编写一个遗传算法来生成字符串“helloworld”。但当 n 为 10,000 或更大时，evolve 函数会产生堆栈溢出。

module Genetics where

import Data.List (sortBy)
import Random (randomRIO)
import Control.Monad (foldM)

class Gene g where
    -- How ideal is the gene from 0.0 to 1.0?
    fitness :: g -> Float

    -- How does a gene mutate?
    mutate :: g -> IO g

    -- How many species will be explored?
    species :: [g] -> Int

orderFitness :: (Gene g) => [g] -> [g]
orderFitness = reverse . sortBy (\a b -> compare (fitness a) (fitness b))

compete :: (Gene g) => [g] -> IO [g]
compete pool = do
    let s = species pool
    variants <- (mapM (mapM mutate) . map (replicate s)) pool
    let pool' = (map head . map orderFitness) variants
    return pool'

evolve :: (Gene g) => Int -> [g] -> IO [g]
evolve 0 pool = return pool
evolve n pool = do
    pool' <- compete pool
    evolve (n - 1) pool'

当species pool = 8时，8个基因的池复制到8个组。每个群体都会发生突变，并选择每个群体中最适应的进行进一步进化（回到 8 个基因）。

GitHub

原文

I'm writing a genetic algorithm to generate the string "helloworld". But the evolve function creates a stack overflow when n is 10,000 or more.

module Genetics where

import Data.List (sortBy)
import Random (randomRIO)
import Control.Monad (foldM)

class Gene g where
    -- How ideal is the gene from 0.0 to 1.0?
    fitness :: g -> Float

    -- How does a gene mutate?
    mutate :: g -> IO g

    -- How many species will be explored?
    species :: [g] -> Int

orderFitness :: (Gene g) => [g] -> [g]
orderFitness = reverse . sortBy (\a b -> compare (fitness a) (fitness b))

compete :: (Gene g) => [g] -> IO [g]
compete pool = do
    let s = species pool
    variants <- (mapM (mapM mutate) . map (replicate s)) pool
    let pool' = (map head . map orderFitness) variants
    return pool'

evolve :: (Gene g) => Int -> [g] -> IO [g]
evolve 0 pool = return pool
evolve n pool = do
    pool' <- compete pool
    evolve (n - 1) pool'

With species pool = 8, a pool of 8 genes replicates to 8 groups. Each group mutates, and the fittest of each group is selected for further evolution (back to 8 genes).

GitHub

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

鱼忆七猫命九 2024-11-13 15:35:46

如果您对性能感兴趣，我会使用快速随机数生成器，例如：

mersenne-random 包

其次，compete 看起来非常可疑，因为它完全是懒惰的，尽管构建了一些潜在的大型结构。尝试使用 deepseq 锤子将其重写得更严格：

import Control.DeepSeq    

compete :: (Gene g, NFData g) => [g] -> IO [g]
compete pool = do
    let s = species pool
    variants <- (mapM (mapM mutate) . map (replicate s)) pool
    let pool' = (map head . map orderFitness) variants
    pool' `deepseq` return pool'

这些东西都不需要不过，在 IO 中（单独的问题）。像 Rand monad 这样的东西可能更合适。

If you're interested in performance, I'd use a fast random number generator, such as:

The mersenne-random package

Secondly, compete looks very suspicious, as it is entirely lazy, despite building some potentially large structures. Try rewriting it to be a bit stricter, using the deepseq hammer:

import Control.DeepSeq    

compete :: (Gene g, NFData g) => [g] -> IO [g]
compete pool = do
    let s = species pool
    variants <- (mapM (mapM mutate) . map (replicate s)) pool
    let pool' = (map head . map orderFitness) variants
    pool' `deepseq` return pool'

None of this stuff needs to be in IO, though, (separate issue). Something like the Rand monad may be more appropriate.

回复收藏 0 原文

涫野音 2024-11-13 15:35:46

感谢 Don 的 deepseq 建议，我能够将问题范围缩小到 mapM mutate，这导致了太多的重击。新版本有 mutate'，它使用 seq 来防止 thunking。

module Genetics where

import Data.List (maximumBy)
import Random (randomRIO)

class Gene g where
    -- How ideal is the gene from 0.0 to 1.0?
    fitness :: g -> Float

    -- How does a gene mutate?
    mutate :: g -> IO g

    -- How many species will be explored in each round?
    species :: [g] -> Int

best :: (Gene g) => [g] -> g
best = maximumBy (\a b -> compare (fitness a) (fitness b))

-- Prevents stack overflow
mutate' :: (Gene g) => g -> IO g
mutate' gene = do
    gene' <- mutate gene
    gene' `seq` return gene'

drift :: (Gene g) => [[g]] -> IO [[g]]
drift = mapM (mapM mutate')

compete :: (Gene g) => [g] -> IO [g]
compete pool = do
    let islands = map (replicate (species pool)) pool
    islands' <- drift islands
    let representatives = map best islands'
    return representatives

evolve :: (Gene g) => Int -> [g] -> IO [g]
evolve 0 pool = return pool
evolve n pool = compete pool >>= evolve (n - 1)

GitHub

Thanks to Don's deepseq suggestion, I was able to narrow the problem to mapM mutate which made too many thunks. The new version has mutate', which uses seq to prevent thunking.

module Genetics where

import Data.List (maximumBy)
import Random (randomRIO)

class Gene g where
    -- How ideal is the gene from 0.0 to 1.0?
    fitness :: g -> Float

    -- How does a gene mutate?
    mutate :: g -> IO g

    -- How many species will be explored in each round?
    species :: [g] -> Int

best :: (Gene g) => [g] -> g
best = maximumBy (\a b -> compare (fitness a) (fitness b))

-- Prevents stack overflow
mutate' :: (Gene g) => g -> IO g
mutate' gene = do
    gene' <- mutate gene
    gene' `seq` return gene'

drift :: (Gene g) => [[g]] -> IO [[g]]
drift = mapM (mapM mutate')

compete :: (Gene g) => [g] -> IO [g]
compete pool = do
    let islands = map (replicate (species pool)) pool
    islands' <- drift islands
    let representatives = map best islands'
    return representatives

evolve :: (Gene g) => Int -> [g] -> IO [g]
evolve 0 pool = return pool
evolve n pool = compete pool >>= evolve (n - 1)

GitHub

回复收藏 0 原文

看透却不说透 2024-11-13 15:35:46

您可以使用 maximumBy 和一个 (map head .map orderFitness)（其中 orderFitness 是一个 sortBy），而不是使用 (map head .map orderFitness) 地图。这并不会节省太多（因为您将从 O(n log n) 变为 O(n)，并且可能会通过消除双映射而获得另一个两倍），但至少更简单、更高效。您还可以摆脱对反向的调用。

我怀疑这可以在没有 deepseq 的情况下解决问题，但它仍然应该是一个改进。

编辑：如果标准库和 GHC 是完美的，那么 head 。 sortBy 将生成与 maximumBy 和 map head 相同的代码。 map sortBy 会生成与 map (head .sortBy) 相同的代码，遗憾的是，这些事情在实践中都不太可能是真的。 sortBy 往往会进行大量额外的内存分配，因为它是一种分而治之的算法。组合地图是一种有时会得到的优化，但不应指望。

更重要的是，使用 maximumBy 更具声明性。更容易看出代码的作用以及需要多长时间。在优化中也应该更容易利用，因为我们知道目标是什么，而不仅仅是如何实现它。