我必须做一个使用遗传算法解决子集和问题的项目。不幸的是,在编码算法时我发现了一个大问题......
我的算法:
- 只要没有找到解决方案并且步数小于步数:
- 计算概率,然后对每个染色体进行分布函数
- 执行选择(轮盘赌)
- 选择要交叉的 n 条染色体
- 执行交叉(交叉点是随机选择的)
- 选择 m 条染色体进行突变
- 执行突变
- 如果找到解决方案,则停止
(算法取自《遗传算法 + 数据结构 = 进化程序》第二章”)
诸如群体大小、数据量、数据收集范围、步骤数、突变数(一步中)、交叉数(一步中)等变量在程序选项中严格设置。
问题是,在群体中经过一定数量(相对较小的)步骤后,所有染色体都是相同的。该问题说明了该图:
http://imageshack.us/m/96/7693/wykresb.png
我做错了什么?如何修复它?
提前致谢。
编辑:
在这里您可以找到我的应用程序的日志:
http://paste.pocoo.org/show/391318/
我认为轮盘赌不是最好的解决方案(正如 deong 所说)。突变也需要改进。
I have to do a project which using a genetic algorithm solves the subset sum problem. Unfortunately, when coding the algorithm I found a big problem ...
My algorithm:
- as long as no solution was found and the number of steps is smaller than steps do:
- calculate the probability and then distribution function for each chromosome
- perform selection (roulette)
- select n chromosomes to be crossed
- perform the crossing (the crossing point is selected randomly)
- select m chromosomes to mutation
- perform mutations
- if you found a solution then stop
(Algorithm was taken from the book "Genetic Algorithms + Data Structures = Evolution Programs, Chapter 2 ")
Variables such as population size, amount of data, scope of data collection, number of steps, the number of mutations (in step), the number of crossings (in a step) is set rigidly in the program options.
The problem is that after a certain (relatively small) number of steps in the population all the chromosomes are identical. The problem illustrates this graph:
http://imageshack.us/m/96/7693/wykresb.png
What I'm doing wrong? How to fix it?
Thanks in advance.
Edit:
Here You can find logs from my app:
http://paste.pocoo.org/show/391318/
I think that roulette is not the best solution (as deong said). Mutations also need to improve.
发布评论
评论(3)
这就是(潜在的)问题。当然,免责声明是您可能只是有一个有缺陷的程序。
轮盘赌的选择实在是太糟糕了。问题在于,在跑步的早期,适应度值的分布是随机的。相比之下,你有一些糟糕的解决方案,也有一些还算合理的解决方案。您不会期望其中任何一个都非常好,但您会期望其中一些比其他更好。
轮盘赌选择利用这些概率的相对差异并将其放大。如果人口规模为 100,并且其中一个人的健康状况比其他人好五倍,那么该人被选中的频率就会是其他人的五倍。在典型的温和突变率下,你很快就会陷入这样一种情况:你两次选择同一个个体进行重组,产生一些新的相同后代,进行非常小的改变(也许),然后将它们放回种群中。因为你还处于运行早期,大多数解决方案仍然很糟糕,所以如果你确实有一个高于平均水平的解决方案,你可以将其选择为五个高于平均水平的解决方案,培育它们以获得十个高于平均水平的解决方案,然后开始整个过程再来一次。如果您在设计运算符集时不非常小心,那么这些解决方案可以很快接管整个群体,即使算法只知道它们比它所见过的真正蹩脚的解决方案更好。
解决方案是使用更好的选择运算符。二元锦标赛选择速度更快、更容易编码,并且施加的选择压力更容易忍受。还有排名偏向选择,它根据适应度排名而不是绝对差异来按比例进行选择。
编辑:这并不是说您不能使用比例选择。只是它很容易过早收敛,为了有效地使用它,您通常必须考虑到这一点来构建一整套运算符。
Here's (potentially) the problem. Disclaimer is of course that you may just have a buggy program.
Roulette wheel selection is just terrible. The problem is that early on in a run, the distribution of fitness values is random. You have some awful solutions and some that are reasonable OK in comparison. You don't expect any of them to be very good, but you would expect some of them to be much better than others.
Roulette wheel selection takes these relative differences in probabilities and amplifies them. If you have a population size of 100, and one individual has a fitness five times better than any others, it will be selected five times as often. With typically mild mutation rates, you end up quickly in a situation where you choose the same individual twice for recombination, produce some new identical offspring, make very minor changes (maybe), and then put them back into the population. Because you're still early in the run, most solutions are still bad, so where you did have one above average solution, you selected it into five above average solutions, bred them to get ten above average solutions, and then started the process all over again. These solutions can very quickly take over the entire population if you aren't really careful with designing your set of operators, even though all the algorithm knows is that they're better than the really crappy solutions it has otherwise seen.
The solution is to use a better selection operator. Binary tournament selection is faster, easier to code, and applies a much more tolerable selection pressure. There is also rank-biased selection which selects proportionally by fitness ranking rather than absolute differences.
Edit: This isn't to say you can't use proportional selection. Just that it is very prone to premature convergence and to use it effectively, you typically have to build an entire set of operators with that in mind.
我以前也遇到过类似的问题,我希望它和你的一样。
首先,你需要检查(使用任何测量指标)染色体 A 是否比染色体 B 更好。这让你对群体的染色体有严格的顺序,并且能够对您的人口进行排序。
然后,当你产生一条新的染色体(通过突变或交叉)时,你可能会产生一条已经存在于你的群体中的染色体。确保不要将其包含在您的人口列表中。
换句话说,确保您的列表始终包含不同的染色体,并且始终按从最好到最差的顺序排序!
笔记:
我使用的遗传算法通常是这样的(这是最通用也是最常用的算法):
将它们添加到 Pop 列表中;
方法;
来自波普。
I had a similar problem before, I wish it s the same as your's
First, you need to check (using any measuring metric) if chromosome A is better than chromosome B. This let you have a strict order of the chromosomes of your population and be able to sort your population.
Then, when you produce a new chromosome (either by mutation or crossover) you may be producing a chromosome that already exist in your population. Make sure not to include this in your population list.
In other words, make sure your list always contains different chromosomes and always sorted from best to worst !
Note:
The genetic algorithms I work with are usually like this (this is the most general algorithm and most used):
add them to list Pop;
methods;
from Pop.
应用遗传算法时,算法可能会陷入局部最优。然而,人们对全局最优(或者更确切地说是这种最优的近似)感兴趣。
可以通过以下方式避免局部最优:
此外,杀死克隆可能很有用。这意味着您在每次迭代后“快速”查看您的群体,并且不允许克隆。我所说的“快速”是指您只需寻找近似的克隆,因为检查精确的克隆需要 O(m*n^2),其中 n 是种群大小,m 是染色体的大小。
这种方法帮助我解决了另一个我也面临克隆的问题。
希望这有帮助,
Christian
编辑
如果您能发布您的交叉函数,那就太好了。最好不是代码,而是纯英文文本。交叉函数是遗传算法的关键部分。
When applying genetic algorithms, it can happen that the algorithm gets stuck at local optima. However, one is interested in the global optimum (or rather an approximation to such an optimum).
Local optima can be avoided by:
Moreover, it may be useful to kill clones. That means you "quickly" look at your population after each iteration and do not allow for clones. By quickly I mean that you just look for approximate clones, because checking for exact clones would take O(m*n^2), where n is your population size and m is the size of a chromosome.
This method helped me in a different problem where I was facing clones as well.
Hope this helped,
Christian
EDIT
It would also be nice if you could post your cross-over function. Preferably not as code, but in plain english text. The cross-over function is the critical part of a genetic algorithm.