初始遗传编程参数

发布于 2024-08-31 09:35:06 字数 130 浏览 7 评论 0原文

我在大学里做过一点全科医生(注:很少)工作,最近一直在玩它。我的问题是关于初始运行设置(种群大小、代数、树的最小/最大深度、初始树的最小/最大深度、用于不同繁殖操作的百分比等)。设置这些参数的正常做法是什么?人们使用哪些论文/网站作为良好的指南?

I did a little GP (note:very little) work in college and have been playing around with it recently. My question is in regards to the intial run settings (population size, number of generations, min/max depth of trees, min/max depth of initial trees, percentages to use for different reproduction operations, etc.). What is the normal practice for setting these parameters? What papers/sites do people use as a good guide?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

小嗲 2024-09-07 09:35:06

您会发现这很大程度上取决于您的问题领域 - 特别是适应度函数的性质、您的 DSL 实现等。

一些个人经验:

  • 人口规模大似乎有效
    当你有嘈杂的健身时更好
    功能,我想这是因为增长
    连续几代人中人口亚群体的行为
    提供更多样本
    适应度函数。我通常使用
    100 表示噪声较小/确定性函数,1000+
    对于吵闹的。
  • 对于代数,最好衡量
    健身功能并在达到时停止
    符合您的目标标准。我通常会运行几百代,看看会得到什么样的答案,如果没有任何改进,那么你可能在其他地方遇到了问题。
  • 树深度要求实际上取决于您的 DSL。我有时会尝试做一个
    没有明确的实施
    限制但惩罚或消除
    运行时间过长的程序(这可能是
    你真正关心的是什么......)。我还发现大约 1000 的总节点数是非常有用的硬限制。
  • 不同突变/重组算子的百分比似乎不存在
    如此重要。只要
    你有一套全面的突变,任何合理平衡的
    分发通常会起作用。我认为这样做的原因是,您基本上是在寻找有利的改进,因此主要目标只是确保试验改进在所有可能性中合理分布。

You'll find that this depends very much on your problem domain - in particular the nature of the fitness function, your implementation DSL etc.

Some personal experience:

  • Large population sizes seem to work
    better when you have a noisy fitness
    function, I think this is because the growth
    of sub-groups in the population over successive generations acts
    to give more sampling of
    the fitness function. I typically use
    100 for less noisy/deterministic functions, 1000+
    for noisy.
  • For number of generations it is best to measure improvements in the
    fitness function and stop when it
    meets your target criteria. I normally run a few hundred generations and see what kind of answers are coming out, if it is showing no improvement then you probably have an issue elsewhere.
  • Tree depth requirements are really dependent on your DSL. I sometimes try to do an
    implementation without explicit
    limits but penalise or eliminate
    programs that run too long (which is probably
    what you really care about....). I've also found total node counts of ~1000 to be quite useful hard limits.
  • Percentages for different mutation / recombination operators don't seem
    to matter all that much. As long as
    you have a comprehensive set of mutations, any reasonably balanced
    distribution will usually work. I think the reason for this is that you are basically doing a search for favourable improvements so the main objective is just to make sure the trial improvements are reasonably well distributed across all the possibilities.
沫雨熙 2024-09-07 09:35:06

为什么不尝试使用遗传算法来为您优化这些参数呢? :)

计算机科学中的任何问题都可以
用另一层解决
间接(除了太多
间接层。)

-大卫·J·惠勒

Why don't you try using a genetic algorithm to optimise these parameters for you? :)

Any problem in computer science can be
solved with another layer of
indirection (except for too many
layers of indirection.)

-David J. Wheeler

似梦非梦 2024-09-07 09:35:06

当我开始研究遗传算法时,我也有同样的问题。

我想收集一个非常简单问题的数据变化参数,并将给定的运算符和参数值(例如突变率等)链接到人口规模函数等的给定结果。

一旦我开始更多地了解 GA,我就意识到考虑到变量数量巨大,这是一项艰巨的任务,并且泛化极其困难。

从我(有限的)经验来看,如果你决定简化问题并使用固定的方式来实现交叉、选择,并且只是考虑种群大小和突变率(以给定的方式实现)试图得出一般结果,那么你很快就会意识到太多的变量仍在发挥作用,因为最终,在统计上您将获得一个不错的结果(无论您想如何定义体面)的代数仍然显然取决于主要取决于您正在解决的问题,从而取决于基因组大小(以不同方式表示同一问题显然会导致给定 GA 参数的影响产生不同的结果!)。

当然可以起草一套指导方针——正如(罕见但很好的)文献所证明的那样——但只有当手头的问题可以用完全相同的方式编码并且适应性以某种等效的方式进行评估(这通常意味着您正在解决非常相似的问题)。

When I started looking into Genetic Algorithms I had the same question.

I wanted to collect data variating parameters on a very simple problem and link given operators and parameters values (such as mutation rates, etc) to given results in function of population size etc.

Once I started getting into GA a bit more I then realized that given the enormous number of variables this is a huge task, and generalization is extremely difficult.

talking from my (limited) experience, if you decide to simplify the problem and use a fixed way to implement crossover, selection, and just play with population size and mutation rate (implemented in a given way) trying to come up with general results you'll soon realize that too many variables are still into play because at the end of the day the number of generations after which statistically you will get a decent result (whatever way you wanna define decent) still obviously depend primarily on the problem you're solving and consequently on the genome size (representing the same problem in different ways will obviously lead to different results in terms of effect of given GA parameters!).

It is certainly possible to draft a set of guidelines - as the (rare but good) literature proves - but you will be able to generalize the results effectively in statistical terms only when the problem at hand can be encoded in the exact same way and the fitness is evaluated in a somehow an equivalent way (which more often than not means you're ealing with a very similar problem).

自在安然 2024-09-07 09:35:06

看看 Koza 关于这些问题的大量著作

Take a look at Koza's voluminous tomes on these matters.

深空失忆 2024-09-07 09:35:06

即使在全科医生界也存在截然不同的思想流派 -
有些人认为人口数(低)数千就足够了,而 Koza 和其他人通常认为不值得在 GP 人口中少于 100 万的情况下开始 GP 运行;-)

正如之前提到的,这取决于您的个人品味和经验、资源以及可能使用的 GP 系统!

干杯,

There are very different schools of thought even within the GP community -
Some regard populations in the (low) thousands as sufficient whereas Koza and others often don't deem if worthy to start a GP run with less than a million individuals in the GP population ;-)

As mentioned before it depends on your personal taste and experiences, resources and probably the GP system used!

Cheers,
Jan

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文