基因表达编程和笛卡尔遗传编程之间的区别

发布于 2024-09-14 01:45:00 字数 190 浏览 4 评论 0原文

进化计算中相当烦人的一点是,稍微不同和重叠的概念往往会选择截然不同的名称。我最近的困惑是,基因表达编程似乎与笛卡尔遗传编程非常相似。

  1. (如何)这些是根本不同的概念吗?
  2. 我读到,GP 指令的间接编码是一种有效的技术(GEP 和 CGP 都这样做)。是否已经就间接编码已经过时的经典树基 GP 达成某种共识?

Something pretty annoying in evolutionary computing is that mildly different and overlapping concepts tend to pick dramatically different names. My latest confusion because of this is that gene-expression-programming seems very similar to cartesian-genetic-programming.

  1. (how) Are these fundamentally different concepts?
  2. I've read that indirect encoding of GP instructions is an effective technique ( both GEP and CGP do that ). Has there been reached some sort of consensus that indirect encoding has outdated classic tree bases GP?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

筱果果 2024-09-21 01:45:00

嗯,基因表达编程 (GEP) 和笛卡尔遗传编程(CGP 或我所认为的经典遗传编程)之间似乎存在一些差异,但这种差异可能比实际情况更夸张。应该是。请注意,我从未使用过 GEP,因此我的所有评论均基于我使用 CGP 的经验。

在 CGP 中,基因型和表型之间没有区别,换句话说,如果您查看 CGP 的“基因”,那么您也在查看它们的表达。这里没有编码,即表达树就是基因本身。

GEP 中,基因型被表达为表型,所以如果你正在查看基因,你会不太清楚表情会是什么样子。 GP 的“发明者”Cândida Ferreira 写了一篇非常好的论文 还有一些其他资源试图对整个概念进行简短的概述。

Ferriera 表示,好处是“显而易见的”,但我确实没有看到任何东西可以使 GEP 比 CGP 更好。显然,GEP 是多基因的,这意味着多个基因参与一个性状的表达(即表达树)。无论如何,适应度是在表达树上计算的,因此 GEP 似乎没有采取任何措施来增加适应度。作者声称 GEP 提高了达到适应度的速度(即在更少的代数内),但坦率地说,您可以看到 CGP 的巨大性能变化,只需采用不同的选择算法、不同的锦标赛结构、拆分 每

选择:

  • 随机
  • 轮盘赌
  • top-n
  • 取一半
  • 等。

比赛频率:

  • 纪元
  • 一次 每个数据实例
  • 每一代

一次。 比赛结构:

  • 取 3 个,杀死 1 个,然后用另外两个的子代替换它。
  • 按适应度对锦标赛中的所有个体进行排序,杀死下半部分并用上半部分的后代替换它(其中较低的适应度较差,上部的适应度较好)。
  • 从比赛中随机挑选个体进行交配并杀死多余的个体。

部落
一个群体可以分为彼此独立发展的部落:

  • 迁移 - 周期性地,来自一个部落的个体会迁移到另一个部落
  • 部落在逻辑上是分开的,因此它们就像在不同的地方运行的自己的独立群体环境。

多元化健身
将多样性纳入适应度,计算有多少个体具有相同的适应度值(因此可能具有相同的表型),并按比例值惩罚他们的适应度:具有相同适应度值的个体越多,对它们的惩罚就越大那些人。这样,具有独特表型的标本将受到鼓励,因此种群停滞的情况就会少得多。

这些只是可以极大地影响 CGP 性能的一些因素,当我说“很大”时,我的意思是它与 Ferriera 的性能处于相同的顺序或更高。因此,如果 Ferriera 没有过多地修改这些想法,那么她可能会看到 CGP 的性能会慢得多……尤其是如果她没有采取任何措施来对抗停滞的话。因此,在阅读 GEP 的性能统计数据时我会非常小心,因为有时人们无法考虑到所有可用的“优化”。

Well, it seems that there is some difference between gene expression programming (GEP) and cartesian genetic programming (CGP or what I view as classic genetic programming), but the difference might be more hyped up than it really ought to be. Please note that I have never used GEP, so all of my comments are based on my experience with CGP.

In CGP there is no distinction between genotype and a phenotype, in other words- if you're looking at the "genes" of a CGP you're also looking at their expression. There is no encoding here, i.e. the expression tree is the gene itself.

In GEP the genotype is expressed into a phenotype, so if you're looking at the genes you will not readily know what the expression is going to look like. The "inventor" of GP, Cândida Ferreira, has written a really good paper and there are some other resources which try to give a shorter overview of the whole concept.

Ferriera says that the benefits are "obvious," but I really don't see anything that would necessarily make GEP better than CGP. Apparently GEP is multigenic, which means that multiple genes are involved in the expression of a trait (i.e. an expression tree). In any case, the fitness is calculated on the expressed tree, so it doesn't seem like GEP is doing anything to increase the fitness. What the author claims is that GEP increases the speed at which the fitness is reached (i.e. in fewer generations), but frankly speaking you can see dramatic performance shifts from a CGP just by having a different selection algorithm, a different tournament structure, splitting the population into tribes, migrating specimens between tribes, including diversity into the fitness, etc.

Selection:

  • random
  • roulette wheel
  • top-n
  • take half
  • etc.

Tournament Frequency:

  • once per epoch
  • once per every data instance
  • once per generation.

Tournament Structure:

  • Take 3, kill 1 and replace it with the child of the other two.
  • Sort all individuals in the tournament by fitness, kill the lower half and replace it with the offspring of the upper half (where lower is worse fitness and upper is better fitness).
  • Randomly pick individuals from the tournament to mate and kill the excess individuals.

Tribes
A population can be split into tribes that evolve independently of each-other:

  • Migration- periodically, individual(s) from a tribe would be moved to another tribe
  • The tribes are logically separated so that they're like their own separate populations running in separate environments.

Diversity Fitness
Incorporate diversity into the fitness, where you count how many individuals have the same fitness value (thus are likely to have the same phenotype) and you penalize their fitness by a proportionate value: the more individuals with the same fitness value, the more penalty for those individuals. This way specimens with unique phenotypes will be encouraged, therefore there will be much less stagnation of the population.

Those are just some of the things that can greatly affect the performance of a CGP, and when I say greatly I mean that it's in the same order or greater than Ferriera's performance. So if Ferriera didn't tinker with those ideas too much, then she could have seen much slower performance of the CGPs... especially if she didn't do anything to combat stagnation. So I would be careful when reading performance statistics on GEP, because sometimes people fail to account for all of the "optimizations" available out there.

等数载,海棠开 2024-09-21 01:45:00

这些答案似乎有些混乱,必须澄清。笛卡尔 GP 不同于经典 GP(又名基于树的 GP)和 GEP。尽管他们共享许多概念并从相同的生物机制中获得灵感,但个体的表现(解决方案)却有所不同。

CGP中,表示(基因型和表型之间的映射)是间接的,换句话说,并不是所有的CGP 基因组中的基因将在表型组中表达(GEP 和许多其他概念中也存在这一概念)。基因型可以编码在节点网格或阵列中,并且生成的程序图仅是活动节点的表达。

GEP 中,表示也是间接的,同样,并非所有基因都会在表型中表达。这种情况下的表示与treeGP或CGP有很大不同,但基因型也表达为程序树。在我看来,GEP 是一种更优雅的表示,更容易实现,但也存在一些缺陷,例如:您必须找到特定于问题的适当的尾部和头部大小,mnltigenic 版本有点像表达式树之间的强制粘合,最后它有太多的膨胀

无论哪种表示在某些特定问题领域可能比另一种更好,它们都是通用的,只要您可以对其进行编码,就可以应用于任何领域。

There seems to be some confusion in these answers that must be clarified. Cartesian GP is different from classic GP (aka tree-based GP), and GEP. Even though they share many concepts and take inspiration from the same biological mechanisms, the representation of the individuals (the solutions) varies.

In CGPthe representation (mapping between genotype and phenotype) is indirect, in other words, not all of the genes in a CGP genome will be expressed in the phenome (a concept also found in GEP and many others). The genotypes can be coded in a grid or array of nodes, and the resulting program graph is the expression of active nodes only.

In GEP the representation is also indirect, and similarly not all genes will be expressed in the phenotype. The representation in this case is much different from treeGP or CGP, but the genotypes are also expressed into a program tree. In my opinion GEP is a more elegant representation, easier to implement, but also suffers from some defects like: you have to find the appropriate tail and head size which is problem specific, the mnltigenic version is a bit of a forced glue between expression trees, and finally it has too much bloat.

Independently of which representation may be better than the other in some specific problem domain, they are general purpose, can be applied to any domain as long as you can encode it.

情仇皆在手 2024-09-21 01:45:00

一般来说,GEP 比 GP 更简单。假设您在程序中允许使用以下节点:常量、变量、+、-、*、/、if、...
对于每个具有 GP 的此类节点,您必须创建以下操作:
- 随机化
- 变异
- 交叉
- 可能还有其他遗传运算符

在 GEP 中,每个此类节点只需要实现一个操作:反序列化,它采用数字数组(如 C 或 Java 中的 double),然后返回节点。它类似于 Java 或 Python 等语言中的对象反序列化(不同之处在于编程语言中的反序列化使用字节数组,这里我们有数字数组)。即使是这种“反序列化”操作也不必由程序员实现:它可以通过通用算法来实现,就像在 Java 或 Python 反序列化中所做的那样。

从一个角度来看,这种简单性可能会使搜索最佳解决方案不太成功,但从另一方面来看:需要程序员做的工作更少,更简单的算法可能执行得更快(更容易优化,更多的代码和数据适合CPU缓存,等等) 。所以我想说 GEP 稍微好一点,但当然,明确的答案取决于问题,对于许多问题来说,情况可能恰恰相反。

In general, GEP is simpler from GP. Let's say you allow the following nodes in your program: constants, variables, +, -, *, /, if, ...
For each of such nodes with GP you must create the following operations:
- randomize
- mutate
- crossover
- and probably other genetic operators as well

In GEP for each of such nodes only one operation is needed to be implemented: deserialize, which takes array of numbers (like double in C or Java), and returns the node. It resembles object deserialization in languages like Java or Python (the difference is that deserialization in programming languages uses byte arrays, where here we have arrays of numbers). Even this 'deserialize' operation doesn't have to be implemented by the programmer: it can be implemented by a generic algorithm, just like it's done in Java or Python deserialization.

This simplicity from one point of view may make searching of best solution less successful, but from other side: requires less work from programmer and simpler algorithms may execute faster (easier to optimize, more code and data fits in CPU cache, and so on). So I would say that GEP is slightly better, but of course the definite answer depends on problem, and for many problems the opposite may be true.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文