遗传算法 (GA) 和遗传编程 (GP) 是有趣的研究领域。
我想知道您使用 GA/GP 解决的具体问题以及您使用的库/框架(如果您没有自己开发)。
问题:
- 您使用 GA/GP 解决了哪些问题?
- 您使用了哪些库/框架?
我正在寻找第一手经验,所以除非你有,否则请不要回答。
Genetic algorithms (GA) and genetic programming (GP) are interesting areas of research.
I'd like to know about specific problems you have solved using GA/GP and what libraries/frameworks you used if you didn't roll your own.
Questions:
- What problems have you used GA/GP to solve?
- What libraries/frameworks did you use?
I'm looking for first-hand experiences, so please do not answer unless you have that.
发布评论
评论(30)
足球小费。我构建了一个 GA 系统来预测 AFL(澳式足球)每周的比赛结果。
几年前,我对标准工作足球池感到厌倦,每个人都只是上网并从媒体上的一些专家那里挑选。所以,我认为击败一群广播新闻专业的学生不会太难,对吧?我的第一个想法是从梅西评级获取结果,然后在赛季结束时透露我获胜后的策略名声和荣耀。然而,由于我从未发现梅西不跟踪 AFL 的原因。我内心的愤世嫉俗者认为这是因为每场 AFL 比赛的结果基本上都变成了随机机会,但我对最近规则变化的抱怨属于不同的论坛。
该系统基本上考虑了进攻强度、防守强度、主场优势、每周的进步(或缺乏)以及每一项的变化速度。这为每个球队在整个赛季创建了一组多项式方程。可以计算给定日期每场比赛的获胜者和得分。目标是找到与过去所有比赛结果最接近的一组系数,并使用该组来预测接下来几周的比赛。
在实践中,系统会找到能够准确预测 90% 以上过去比赛结果的解决方案。然后,它会成功挑选出下周(即不在训练集中的那一周)的大约 60-80% 的游戏。
结果:略高于中间位置。没有重大现金奖励,也没有可以用来击败维加斯的系统。不过这很有趣。
我从头开始构建一切,没有使用任何框架。
Football Tipping. I built a GA system to predict the week to week outcome of games in the AFL (Aussie Rules Football).
A few years ago I got bored of the standard work football pool, everybody was just going online and taking the picks from some pundit in the press. So, I figured it couldn't be too hard to beat a bunch of broadcast journalism majors, right? My first thought was to take the results from Massey Ratings and then reveal at the end of the season my strategy after winning fame and glory. However, for reasons I've never discovered Massey does not track AFL. The cynic in me believes it is because the outcome of each AFL game has basically become random chance, but my complaints of recent rule changes belong in a different forum.
The system basically considered offensive strength, defensive strength, home field advantage, week to week improvement (or lack thereof) and velocity of changes to each of these. This created a set of polynomial equations for each team over the season. The winner and score for each match for a given date could be computed. The goal was to find the set of coefficients that most closely matched the outcome of all past games and use that set to predict the upcoming weeks game.
In practice, the system would find solutions that accurately predicted over 90% of past game outcomes. It would then successfully pick about 60-80% of games for the upcoming week (that is the week not in the training set).
The result: just above middle of the pack. No major cash prize nor a system that I could use to beat Vegas. It was fun though.
I built everything from scratch, no framework used.
以及一些常见问题,例如旅行推销员和 Roger Alsing 的蒙娜丽莎程序,我还编写了进化数独求解器(这需要我更多的原创想法,而不仅仅是重新实现别人的想法)。有更可靠的算法来解决数独,但进化方法效果相当好。
在过去的几天里,在看到这篇文章后,我一直在玩一个进化程序来寻找扑克的“冷套牌” 在 Reddit 上。目前还不太令人满意,但我想我可以改进它。
我有我自己的框架,用于进化算法。
As well as some of the common problems, like the Travelling Salesman and a variation on Roger Alsing's Mona Lisa program, I've also written an evolutionary Sudoku solver (which required a bit more original thought on my part, rather than just re-implementing somebody else's idea). There are more reliable algorithms for solving Sudokus but the evolutionary approach works fairly well.
In the last few days I've been playing around with an evolutionary program to find "cold decks" for poker after seeing this article on Reddit. It's not quite satisfactory at the moment but I think I can improve it.
I have my own framework that I use for evolutionary algorithms.
我为我公司 1992 年为货运行业开发的 3D 激光表面轮廓系统开发了自制 GA。
该系统依赖于 3 维三角测量并使用定制激光线扫描仪、512x512 相机(具有定制捕获硬件)。相机和激光之间的距离永远不会精确,而且相机的焦点也不会出现在您期望的 256,256 位置!
尝试使用标准几何形状和模拟退火式方程求解来计算校准参数是一场噩梦。
遗传算法在一个晚上就完成了,我创建了一个校准立方体来测试它。我知道立方体的尺寸非常准确,因此我的想法是,我的 GA 可以为每个扫描单元制定一组自定义三角测量参数,以克服生产差异。
这一招奏效了。至少可以说我很惊讶!在大约 10 代之内,我的“虚拟”立方体(从原始扫描生成并从校准参数重新创建)实际上看起来像一个立方体!经过大约 50 代之后,我得到了所需的校准。
I developed a home brew GA for a 3D laser surface profile system my company developed for the freight industry back in 1992.
The system relied upon 3 dimensional triangulation and used a custom laser line scanner, a 512x512 camera (with custom capture hw). The distance between the camera and laser was never going to be precise and the focal point of the cameras were not to be found in the 256,256 position that you expected it to be!
It was a nightmare to try and work out the calibration parameters using standard geometry and simulated annealing style equation solving.
The Genetic algorithm was whipped up in an evening and I created a calibration cube to test it on. I knew the cube dimensions to high accuracy and thus the idea was that my GA could evolve a set of custom triangulation parameters for each scanning unit that would overcome production variations.
The trick worked a treat. I was flabbergasted to say the least! Within around 10 generations my 'virtual' cube (generated from the raw scan and recreated from the calibration parameters) actually looked like a cube! After around 50 generations I had the calibration I needed.
当您计划粉刷房屋时,通常很难获得准确的颜色组合。通常,您心里有某种颜色,但供应商向您展示的不是其中一种颜色。
昨天,我的GA研究员教授提到了一个发生在德国的真实故事(抱歉,我没有更多的参考资料,是的,如果有人需要的话我可以找到)。这个人(让我们称他为颜色人)曾经挨家挨户地帮助人们找到确切的颜色代码(在RGB),这将最接近客户的想法。他是这样做的:
彩色人过去常常随身携带一个使用遗传算法的软件程序。他通常从 4 种不同的颜色开始——每种颜色都编码为编码染色体(其解码值将是 RGB 值)。消费者从 4 种颜色中选择一种(最接近他/她心目中的颜色)。然后,程序会将最大的适应度分配给该个体,并使用突变/交叉转移到下一代。重复上述步骤,直到消费者找到确切的颜色,然后颜色专家告诉他 RGB 组合!
通过将最大适合度分配给接近消费者心目中的颜色,颜色专家的计划增加了收敛到消费者心目中确切颜色的机会。我发现这很有趣!
现在我已经得到了 -1,如果您计划得到更多 -1,请。阐明这样做的原因!
Its often difficult to get an exact color combination when you are planning to paint your house. Often, you have some color in mind, but it is not one of the colors, the vendor shows you.
Yesterday, my Prof. who is a GA researcher mentioned about a true story in Germany (sorry, I have no further references, yes, I can find it out if any one requests to). This guy (let's call him the color guy) used to go from door-door to help people to find the exact color code (in RGB) that would be the closet to what the customer had in mind. Here is how he would do it:
The color guy used to carry with him a software program which used GA. He used to start with 4 different colors- each coded as a coded Chromosome (whose decoded value would be a RGB value). The consumer picks 1 of the 4 colors (Which is the closest to which he/she has in mind). The program would then assign the maximum fitness to that individual and move onto the next generation using mutation/crossover. The above steps would be repeated till the consumer had found the exact color and then color guy used to tell him the RGB combination!
By assigning maximum fitness to the color closes to what the consumer have in mind, the color guy's program is increasing the chances to converge to the color, the consumer has in mind exactly. I found it pretty fun!
Now that I have got a -1, if you are planning for more -1's, pls. elucidate the reason for doing so!
几周前,我提出了一个SO解决方案,使用遗传算法来解决图形布局问题。这是一个约束优化问题的例子。
同样在机器学习领域,我从头开始用 c/c++ 实现了一个基于 GA 的分类规则框架。
我还在示例项目中使用 GA 来训练人工神经网络 (ANN)使用著名的反向传播算法。
此外,作为我研究生研究的一部分,我在训练隐马尔可夫模型 作为基于 EM 的 Baum-Welch 算法的附加方法(再次在 c/c++ 中)。
A couple of weeks ago, I suggested a solution on SO using genetic algorithms to solve a problem of graph layout. It is an example of a constrained optimization problem.
Also in the area of machine learning, I implemented a GA-based classification rules framework in c/c++ from scratch.
I've also used GA in a sample project for training artificial neural networks (ANN) as opposed to using the famous backpropagation algorithm.
In addition, and as part of my graduate research, I've used GA in training Hidden Markov Models as an additional approach to the EM-based Baum-Welch algorithm (in c/c++ again).
作为我本科计算机科学学位的一部分,我们被分配的问题是为 Jikes 研究虚拟机找到最佳的 jvm 标志。这是使用 Dicappo 基准套件进行评估的,该套件将时间返回到控制台。我编写了一个分布式逻辑算法,它切换这些标志以提高基准测试套件的运行时间,尽管需要运行几天才能补偿影响结果的硬件抖动。唯一的问题是我没有正确了解编译器理论(这是作业的目的)。
我本可以使用现有的默认标志来播种初始群体,但有趣的是,该算法发现了与 O3 优化级别非常相似的配置(但实际上在许多测试中更快)。
编辑:我还用 Python 编写了自己的遗传算法框架来完成作业,并仅使用 popen 命令来运行各种基准测试,尽管如果不是评估作业,我会查看 pyEvolve。
As part of my undergraduate CompSci degree, we were assigned the problem of finding optimal jvm flags for the Jikes research virtual machine. This was evaluated using the Dicappo benchmark suite which returns a time to the console. I wrote a distributed gentic alogirthm that switched these flags to improve the runtime of the benchmark suite, although it took days to run to compensate for hardware jitter affecting the results. The only problem was I didn't properly learn about the compiler theory (which was the intent of the assignment).
I could have seeded the initial population with the exisiting default flags, but what was interesting was that the algorithm found a very similar configuration to the O3 optimisation level (but was actually faster in many tests).
Edit: Also I wrote my own genetic algorithm framework in Python for the assignment, and just used the popen commands to run the various benchmarks, although if it wasn't an assessed assignment I would have looked at pyEvolve.
首先,乔纳森·科扎(Jonathan Koza)(亚马逊)的“基因编程”几乎是关于遗传和进化算法/编程技术的书,有很多例子。我强烈建议您检查一下。
至于我自己对遗传算法的使用,我使用(自制的)遗传算法来演化出用于对象收集/销毁场景的群体算法(实际目的可能是清除雷区)。这是论文的链接。我所做的最有趣的部分是多阶段适应度函数,这是必要的,因为简单的适应度函数没有为遗传算法提供足够的信息来充分区分群体成员。
First off, "Genetic Programming" by Jonathan Koza (on amazon) is pretty much THE book on genetic and evolutionary algorithm/programming techniques, with many examples. I highly suggest checking it out.
As for my own use of a genetic algorithm, I used a (home grown) genetic algorithm to evolve a swarm algorithm for an object collection/destruction scenario (practical purpose could have been clearing a minefield). Here is a link to the paper. The most interesting part of what I did was the multi-staged fitness function, which was a necessity since the simple fitness functions did not provide enough information for the genetic algorithm to sufficiently differentiate between members of the population.
我所在的团队正在研究使用进化计算 (EC) 自动修复现有程序中的错误。我们已成功修复了现实软件项目中的许多实际错误(请参阅此项目的主页)。
我们有两种 EC 修复技术的应用。
第一个(通过项目页面提供的代码和复制信息)发展了从现有 C 程序解析的抽象语法树,并使用我们自己的自定义 EC 引擎在 Ocaml 中实现。
第二个(通过项目页面提供的代码和复制信息)是我个人对该项目的贡献,它改进了从用多种编程语言编写的程序编译的 x86 程序集或 Java 字节代码。该应用程序是在 Clojure 中实现的,并且还使用自己定制的 EC 引擎。
进化计算的一个优点是该技术的简单性使得您可以轻松编写自己的自定义实现。有关遗传编程的免费优秀介绍性文本,请参阅遗传编程实地指南。
I am part of a team investigating the use of Evolutionary Computation (EC) to automatically fix bugs in existing programs. We have successfully repaired a number of real bugs in real world software projects (see this project's homepage).
We have two applications of this EC repair technique.
The first (code and reproduction information available through the project page) evolves the abstract syntax trees parsed from existing C programs and is implemented in Ocaml using our own custom EC engine.
The second (code and reproduction information available through the project page), my personal contribution to the project, evolves the x86 assembly or Java byte code compiled from programs written in a number of programming languages. This application is implemented in Clojure and also uses its own custom built EC engine.
One nice aspect of Evolutionary Computation is the simplicity of the technique makes it possible to write your own custom implementations without too much difficulty. For a good freely available introductory text on Genetic Programming see the Field Guide to Genetic Programming.
我和一位同事正在研究一种使用我们公司要求的各种标准将货物装载到卡车上的解决方案。我一直在研究遗传算法解决方案,而他则使用带有积极修剪的分支定界法。我们仍在实施该解决方案,但到目前为止,我们已经取得了良好的成果。
A coworker and I are working on a solution for loading freight onto trucks using the various criteria our company requires. I've been working on a Genetic Algorithm solution while he is using a Branch And Bound with aggressive pruning. We are still in the process of implementing this solution but so far, we have been getting good results.
几年前,我使用 ga 来优化 ASR(自动语音识别)语法,以获得更好的识别率。我从相当简单的选择列表开始(GA 正在测试每个插槽的可能术语的组合),然后逐步发展到更开放和更复杂的语法。通过在一种语音距离函数下测量术语/序列之间的间隔来确定适合度。我还尝试对语法进行弱等价变体,以找到编译为更紧凑表示的语法(最后我采用了直接算法,它大大增加了我们可以在应用程序中使用的“语言”的大小) 。
最近,我将它们用作默认假设来测试各种算法生成的解决方案的质量。这主要涉及分类和不同类型的拟合问题(即创建一个“规则”来解释审阅者对数据集所做的一组选择)。
Several years ago I used ga's to optimize asr (automatic speech recognition) grammars for better recognition rates. I started with fairly simple lists of choices (where the ga was testing combinations of possible terms for each slot) and worked my way up to more open and complex grammars. Fitness was determined by measuring separation between terms/sequences under a kind of phonetic distance function. I also experimented with making weakly equivalent variations on a grammar to find one that compiled to a more compact representation (in the end I went with a direct algorithm, and it drastically increased the size of the "language" that we could use in applications).
More recently I have used them as a default hypothesis against which to test the quality of solutions generated from various algorithms. This has largely involved categorization and different kinds of fitting problems (i.e. create a "rule" that explains a set of choices made by reviewers over a dataset(s)).
我做了一个完整的GA框架,名为“GALAB”,解决了很多问题:
I made a complete GA framework named "GALAB", to solve many problems:
我曾经使用 GA 来优化内存地址的哈希函数。地址的页面大小为 4K 或 8K,因此它们在地址的位模式中显示出一定的可预测性(最低有效位全部为零;中间位定期递增,等等)。原始哈希函数是“块状”的 - 它倾向于聚集命中每三个哈希桶上。改进后的算法具有近乎完美的分布。
I once used a GA to optimize a hash function for memory addresses. The addresses were 4K or 8K page sizes, so they showed some predictability in the bit pattern of the address (least significant bits all zero; middle bits incrementing regularly, etc.) The original hash function was "chunky" - it tended to cluster hits on every third hash bucket. The improved algorithm had a nearly perfect distribution.
我构建了一个简单的 GA,用于在播放音乐时从频谱中提取有用的模式。输出用于驱动 winamp 插件中的图形效果。
我将一些 GA 调整为频谱的不同部分以及不同的 BPM 限制,因此它们不会趋向于相同的模式。每个群体前 4 名的输出被发送到渲染引擎。
一个有趣的副作用是,人群的平均健康度是音乐变化的一个很好的指标,尽管通常需要 4-5 秒才能弄清楚。
I built a simple GA for extracting useful patterns out of the frequency spectrum of music as it was being played. The output was used to drive graphical effects in a winamp plugin.
I had a few GAs tuned to different parts of the spectrum as well as different BPM limits, so they didn't tend to converge towards the same pattern. The outputs from the top 4 from each population were sent to the rendering engine.
An interesting side effect was that the average fitness across the population was a good indicator for changes in the music, although it generally took 4-5 seconds to figure it out.
我不知道家庭作业是否重要……
在学习期间,我们推出了自己的程序来解决旅行商问题。
我们的想法是对几个标准(映射问题的难度、性能等)进行比较,我们还使用了其他技术,例如 模拟退火。
它工作得很好,但是我们花了一段时间才理解如何正确地进行“再现”阶段:将手头的问题建模为适合遗传编程的东西,这确实让我觉得最困难的部分......
这是一门有趣的课程,因为我们还涉足神经网络等。
我想知道是否有人在“生产”代码中使用过这种编程。
I don't know if homework counts...
During my studies we rolled our own program to solve the Traveling Salesman problem.
The idea was to make a comparison on several criteria (difficulty to map the problem, performance, etc) and we also used other techniques such as Simulated annealing.
It worked pretty well, but it took us a while to understand how to do the 'reproduction' phase correctly: modeling the problem at hand into something suitable for Genetic programming really struck me as the hardest part...
It was an interesting course since we also dabbled with neural networks and the like.
I'd like to know if anyone used this kind of programming in 'production' code.
我使用简单的遗传算法来优化表示为二进制字符串的波的信噪比。通过以某种方式翻转几百万代的比特,我能够产生一种变换,从而导致该波具有更高的信噪比。该算法也可以是“模拟退火”,但在本例中没有使用。从本质上讲,遗传算法很简单,这与我见过的用例一样简单,因此我没有使用生成和选择的框架 - 仅使用随机种子和信噪比手头的功能。
I used a simple genetic algorithm to optimize the signal to noise ratio of a wave that was represented as a binary string. By flipping the the bits certain ways over several million generations I was able to produce a transform that resulted in a higher signal to noise ratio of that wave. The algorithm could have also been "Simulated Annealing" but was not used in this case. At their core, genetic algorithms are simple, and this was about as simple of a use case that I have seen, so I didn't use a framework for generation creation and selection - only a random seed and the Signal-to-Noise Ratio function at hand.
作为我论文的一部分,我为多目标优化算法 mPOEMS(具有进化改进步骤的多目标原型优化)编写了一个通用的 java 框架,这是一个使用进化概念的 GA。它的通用性在于,所有与问题无关的部分都已与问题相关的部分分离,并且提供了一个接口来仅添加与问题相关的部分来使用该框架。这样想使用该算法就不必从零开始,工作起来也方便很多。
您可以在此处找到代码。
使用该算法找到的解决方案已在科学著作中与最先进的算法 SPEA-2 和 NSGA 进行了比较,并且已证明:
该算法的性能相当甚至更好,具体取决于您用来衡量性能的指标,特别是取决于您正在寻找的优化问题。
您可以找到它此处。
另外,作为我的论文和工作证明的一部分,我将该框架应用于项目组合管理中发现的项目选择问题。它是关于选择为公司增加最大价值、最支持公司战略或支持任何其他任意目标的项目。例如,从特定类别中选择一定数量的项目,或最大化项目协同效应,......
我的论文将该框架应用于项目选择问题:
http://www.ub.tuwien.ac.at/dipl/2008 /AC05038968.pdf
之后,我在财富 500 强之一的投资组合管理部门工作,他们使用商业软件,该软件也将遗传算法应用于项目选择问题/投资组合优化。
更多资源:
框架的文档:
http://thomaskremmel.com/mpoems/mpoems_in_java_documentation.pdf
mPOEMS 演示文稿:
http://portal.acm.org/itation.cfm?id=1792634.1792653
实际上,只要有一点热情,每个人都可以轻松地将通用框架的代码应用于任意多目标优化问题。
As part of my thesis I wrote a generic java framework for the multi-objective optimisation algorithm mPOEMS (Multiobjective prototype optimization with evolved improvement steps), which is a GA using evolutionary concepts. It is generic in a way that all problem-independent parts have been separated from the problem-dependent parts, and an interface is povided to use the framework with only adding the problem-dependent parts. Thus one who wants to use the algorithm does not have to begin from zero, and it facilitates work a lot.
You can find the code here.
The solutions which you can find with this algorithm have been compared in a scientific work with state-of-the-art algorithms SPEA-2 and NSGA, and it has been proven that
the algorithm performes comparable or even better, depending on the metrics you take to measure the performance, and especially depending on the optimization-problem you are looking on.
You can find it here.
Also as part of my thesis and proof of work I applied this framework to the project selection problem found in portfolio management. It is about selecting the projects which add the most value to the company, support most the strategy of the company or support any other arbitrary goal. E.g. selection of a certain number of projects from a specific category, or maximization of project synergies, ...
My thesis which applies this framework to the project selection problem:
http://www.ub.tuwien.ac.at/dipl/2008/AC05038968.pdf
After that I worked in a portfolio management department in one of the fortune 500, where they used a commercial software which also applied a GA to the project selection problem / portfolio optimization.
Further resources:
The documentation of the framework:
http://thomaskremmel.com/mpoems/mpoems_in_java_documentation.pdf
mPOEMS presentation paper:
http://portal.acm.org/citation.cfm?id=1792634.1792653
Actually with a bit of enthusiasm everybody could easily adapt the code of the generic framework to an arbitrary multi-objective optimisation problem.
在工作中我遇到了以下问题:给定 M 个任务和 N 个 DSP,将任务分配给 DSP 的最佳方式是什么? “最佳”被定义为“最小化负载最多的 DSP 的负载”。任务有不同类型,并且不同的任务类型根据分配位置的不同会产生不同的性能影响,因此我将作业到 DSP 的分配集编码为“DNA 字符串”,然后使用遗传算法来“繁殖”我能想到的最好的分配字符串。
它工作得相当好(比我以前的方法要好得多,我以前的方法是评估每种可能的组合......对于不平凡的问题规模,它需要数年才能完成!),唯一的问题是没有办法告诉是否达到最优解。你只能决定当前的“尽力而为”是否足够好,或者让它运行更长时间,看看是否可以做得更好。
At work I had the following problem: given M tasks and N DSPs, what was the best way to assign tasks to DSPs? "Best" was defined as "minimizing the load of the most loaded DSP". There were different types of tasks, and various task types had various performance ramifications depending on where they were assigned, so I encoded the set of job-to-DSP assignments as a "DNA string" and then used a genetic algorithm to "breed" the best assignment string I could.
It worked fairly well (much better than my previous method, which was to evaluate every possible combination... on non-trivial problem sizes, it would have taken years to complete!), the only problem was that there was no way to tell if the optimal solution had been reached or not. You could only decide if the current "best effort" was good enough, or let it run longer to see if it could do better.
codechef.com 上有一场竞赛(顺便说一下,这是一个很棒的网站,每月一次的编程竞赛),参赛者需要解决一个无法解决的数独(一个人应该尽可能接近并尽可能少地出现错误的列/行/等)。
我要做的就是首先生成一个完美的数独,然后覆盖已给出的字段。在此基础上,我使用遗传编程来改进我的解决方案。
在这种情况下,我想不出确定性的方法,因为数独是 300x300,搜索会花费太长时间。
There was an competition on codechef.com (great site by the way, monthly programming competitions) where one was supposed to solve an unsolveable sudoku (one should come as close as possible with as few wrong collumns/rows/etc as possible).
What I would do, was to first generate a perfect sudoku and then override the fields, that have been given. From this pretty good basis on I used genetic programming to improve my solution.
I couldn't think of a deterministic approach in this case, because the sudoku was 300x300 and search would've taken too long.
在学校的一次研讨会上,我们开发了一个基于音乐模式生成音乐的应用程序。该程序是用 Java 构建的,输出是包含歌曲的 MIDI 文件。我们使用独特的 GA 方法来生成音乐。我认为这个程序对于探索新的作品很有用。
In a seminar in the school, we develop an application to generate music based in the musical mode. The program was build in Java and the output was a midi file with the song. We using distincts aproachs of GA to generate the music. I think this program can be useful to explore new compositions.
在本科阶段,我们使用 NERO(神经网络和遗传算法的组合)来教游戏中的机器人做出智能决策。太酷了。
in undergrad, we used NERO (a combination of neural network and genetic algorithm) to teach in-game robots to make intelligent decisions. It was pretty cool.
我开发了一种基于多线程摆动的机器人导航模拟,通过一组食物源和矿山的随机网格地形,并开发了一种基于遗传算法的策略,用于探索机器人行为的优化和机器人染色体的适者生存基因。这是通过每个迭代周期的图表和映射来完成的。
从那时起,我就养成了更多的游戏行为。我最近为自己构建的一个示例应用程序是一种遗传算法,用于解决英国路线查找中的旅行推销员问题,考虑到出发和目标状态以及一个/多个连接点、延误、取消、建筑工程、高峰时间、公众罢工,考虑最快与最便宜的路线。然后为某一天的路线提供均衡的建议。
一般来说,我的策略是使用基于 POJO 的基因表示,然后应用特定的接口实现来进行选择、突变、交叉策略和标准点。然后,根据我需要用作启发式测量的策略和标准,我的适应度函数基本上变得相当复杂。
我还研究了使用系统突变周期将遗传算法应用于代码内的自动化测试,其中算法理解逻辑并尝试确定错误报告以及代码修复建议。基本上,这是一种优化代码并提供改进建议的方法,以及自动发现新程序代码的方法。我还尝试将遗传算法应用于音乐制作以及其他应用。
一般来说,我发现进化策略就像大多数元启发式/全局优化策略一样,它们一开始学习起来很慢,但随着解决方案越来越接近目标状态,并且只要你的适应度函数和启发式方法很好地结合起来,它们就会开始学习搜索空间内的融合。
I developed a multithreaded swing based simulation of robot navigation through a set of randomized grid terrain of food sources and mines and developed a genetic algorithm based strategy of exploring the optimization of robotic behavior and survival of fittest genes for a robotic chromosome. This was done using charting and mapping of each iteration cycle.
Since, then I have developed even more game behavior. An example application I built recently for myself was a genetic algorithm for solving the traveling sales man problem in route finding in UK taking into account start and goal states as well as one/multiple connection points, delays, cancellations, construction works, rush hour, public strikes, consideration between fastest vs cheapest routes. Then providing a balanced recommendation for the route to take on a given day.
Generally, my strategy is to use POJO based representaton of genes then I apply specific interface implementations for selection, mutation, crossover strategies, and the criteria point. My fitness function then basically becomes a quite complex based on the strategy and criteria I need to apply as a heuristic measure.
I have also looked into applying genetic algorithm into automated testing within code using systematic mutation cycles where the algorithm understands the logic and tries to ascertain a bug report with recommendations for code fixes. Basically, a way to optimize my code and provide recommendations for improvement as well as a way of automating the discovery of new programmatic code. I have also tried to apply genetic algorithms to music production amongst other applications.
Generally, I find evolutionary strategies like most metaheuristic/global optimization strategies, they are slow to learn at first but start to pick up as the solutions become closer and closer to goal state and as long as your fitness function and heuristics are well aligned to produce that convergence within your search space.
我曾经尝试制作一个完全基于遗传编程的围棋电脑棋手。每个程序都将被视为一系列动作的评估函数。不过,即使是在相当小的 3x4 板上,生成的程序也不是很好。
我使用 Perl,并自己编写所有代码。今天我会做不同的事情。
I once tried to make a computer player for the game of Go, exclusively based on genetic programming. Each program would be treated as an evaluation function for a sequence of moves. The programs produced weren't very good though, even on a rather diminuitive 3x4 board.
I used Perl, and coded everything myself. I would do things differently today.
读完盲人钟表匠后,我对道金斯说他开发的帕斯卡程序感兴趣可以随着时间进化的生物体模型。我很有兴趣使用 Swarm 编写自己的代码。我没有制作他所做的所有奇特的生物图形,但我的“染色体”控制着影响生物体生存能力的特征。他们生活在一个简单的世界里,可以与彼此和他们的环境进行斗争。
生物体的生存或死亡部分是由于偶然性,但也取决于它们适应当地环境的效率、它们消耗营养物质和食物的能力。他们的繁殖有多成功。这很有趣,但也向我的妻子证明我是一个极客。
After reading The Blind Watchmaker, I was interested in the pascal program Dawkins said he had developed to create models of organisms that could evolve over time. I was interested enough to write my own using Swarm. I didn't make all the fancy critter graphics he did, but my 'chromosomes' controlled traits which affected organisms ability to survive. They lived in a simple world and could slug it out against each other and their environment.
Organisms lived or died partly due to chance, but also based on how effectively they adapted to their local environments, how well they consumed nutrients & how successfully they reproduced. It was fun, but also more proof to my wife that I am a geek.
不久前,我推出了 GA 来改进实际上的图像处理内核,以消除哈勃太空望远镜 (HST) 图像中的宇宙射线痕迹。标准方法是使用哈勃望远镜进行多次曝光,并仅保留所有图像中相同的内容。由于 HST 时间非常宝贵,我是一个天文学爱好者,最近参加了进化计算大会,我考虑使用 GA 来清理单次曝光。
这些个体以树的形式存在,以 3x3 像素区域作为输入,执行一些计算,并就是否以及如何修改中心像素做出决定。通过将输出与以传统方式(即堆叠曝光)清理的图像进行比较来判断适合度。
它确实起到了一定的作用,但还不足以保证放弃原来的方法。如果我没有受到论文时间的限制,我可能会扩展算法可用的遗传部分箱。我很确定我可以显着改进它。
使用的库:如果我没记错的话,IRAF 和 cfitsio 用于天文图像数据处理和 I/O。
It was a while ago, but I rolled a GA to evolve what were in effect image processing kernels to remove cosmic ray traces from Hubble Space Telescope (HST) images. The standard approach is to take multiple exposures with the Hubble and keep only the stuff that is the same in all the images. Since HST time is so valuable, I'm an astronomy buff, and had recently attended the Congress on Evolutionary Computation, I thought about using a GA to clean up single exposures.
The individuals were in the form of trees that took a 3x3 pixel area as input, performed some calculations, and produced a decision about whether and how to modify the center pixel. Fitness was judged by comparing the output with an image cleaned up in the traditional way (i.e. stacking exposures).
It actually sort of worked, but not well enough to warrant foregoing the original approach. If I hadn't been time-constrained by my thesis, I might have expanded the genetic parts bin available to the algorithm. I'm pretty sure I could have improved it significantly.
Libraries used: If I recall correctly, IRAF and cfitsio for astronomical image data processing and I/O.
我年轻时就尝试过 GA。我用 Python 编写了一个模拟器,其工作原理如下。
这些基因编码了神经网络的权重。
神经网络的输入是检测触摸的“天线”。较高的值表示非常接近,0 表示不接触。
输出到两个“轮子”。如果两个轮子都向前,那家伙就向前。如果轮子方向相反,那家伙就会转向。输出的强度决定了车轮转动的速度。
一个简单的迷宫就生成了。这真的很简单——甚至很愚蠢。屏幕底部有起点,顶部有一个球门,中间有四堵墙。每面墙都有一个随机的空间,所以总有一条路。
我一开始就随机选择了一些人(我认为他们是虫子)。一旦有人达到了目标,或者达到了时间限制,体能就会被计算出来。它与当时距球门的距离成反比。
然后我将它们配对并“培育”它们以创造下一代。被选择繁殖的概率与其适应度成正比。有时,这意味着如果一个动物具有非常高的相对适应度,就会反复与自己繁殖。
我以为他们会养成“左墙拥抱”的行为,但他们似乎总是遵循一些不太理想的行为。在每次实验中,虫子都会聚集成螺旋状。它们会向外盘旋,直到碰到右侧的墙壁。他们会遵循这一点,然后当他们到达间隙时,他们会螺旋下降(远离间隙)并绕圈。他们通常会向左转 270 度,然后进入缺口。这将使他们穿过大部分墙壁,并且常常到达目标。
我添加的一项功能是将颜色向量放入基因中以跟踪个体之间的相关性。几代之后,它们的颜色都会相同,这告诉我我应该有更好的繁殖策略。
我试图让他们制定更好的策略。我使神经网络变得复杂——添加了内存和所有东西。这没有帮助。我总是看到同样的策略。
我尝试了各种方法,例如建立单独的基因库,仅在 100 代后才重新组合。但没有什么能促使他们采取更好的策略。也许这是不可能的。
另一件有趣的事情是绘制一段时间内的健身情况图表。有明确的模式,比如最大适应度在上升之前先下降。我从未见过进化论书籍谈论过这种可能性。
I experimented with GA in my youth. I wrote a simulator in Python that worked as follows.
The genes encoded the weights of a neural network.
The neural network's inputs were "antennae" that detected touches. Higher values meant very close and 0 meant not touching.
The outputs were to two "wheels". If both wheels went forward, the guy went forward. If the wheels were in opposite directions, the guy turned. The strength of the output determined the speed of the wheel turning.
A simple maze was generated. It was really simple--stupid even. There was the start at the bottom of the screen and a goal at the top, with four walls in between. Each wall had a space taken out randomly, so there was always a path.
I started random guys (I thought of them as bugs) at the start. As soon as one guy reached the goal, or a time limit was reached, the fitness was calculated. It was inversely proportional to the distance to the goal at that time.
I then paired them off and "bred" them to create the next generation. The probability of being chosen to be bred was proportional to its fitness. Sometimes this meant that one was bred with itself repeatedly if it had a very high relative fitness.
I thought they would develop a "left wall hugging" behavior, but they always seemed to follow something less optimal. In every experiment, the bugs converged to a spiral pattern. They would spiral outward until they touched a wall to the right. They'd follow that, then when they got to the gap, they'd spiral down (away from the gap) and around. They would make a 270 degree turn to the left, then usually enter the gap. This would get them through a majority of the walls, and often to the goal.
One feature I added was to put in a color vector into the genes to track relatedness between individuals. After a few generations, they'd all be the same color, which tell me I should have a better breeding strategy.
I tried to get them to develop a better strategy. I complicated the neural net--adding a memory and everything. It didn't help. I always saw the same strategy.
I tried various things like having separate gene pools that only recombined after 100 generations. But nothing would push them to a better strategy. Maybe it was impossible.
Another interesting thing is graphing the fitness over time. There were definite patterns, like the maximum fitness going down before it would go up. I have never seen an evolution book talk about that possibility.
不是家庭作业。
我作为专业程序员的第一份工作(1995 年)是为 S&P500 期货编写基于遗传算法的自动交易系统。该应用程序是用 Visual Basic 3 [!] 编写的,我不知道当时我是如何做的,因为 VB3 甚至没有类。
该应用程序从一组随机生成的固定长度字符串(“基因”部分)开始,每个字符串对应于标准普尔 500 指数期货每分钟价格数据中的特定形状,以及具体订单(买入或卖出)以及止损和止盈金额。每个字符串(或“基因”)的利润表现均通过 3 年的历史数据进行评估;每当指定的“形状”与历史数据匹配时,我就会假设相应的买入或卖出订单并评估交易结果。我补充了一个警告,即每个基因都以固定金额开始,因此可能会破产并完全从基因库中删除。
每次对种群进行评估后,幸存者都会被随机杂交(通过混合来自两个亲本的片段),其中一个基因被选为亲本的可能性与其产生的利润成正比。我还添加了点突变的可能性,让事情变得更有趣。经过几百代之后,我最终得到了一组基因,可以将 5000 美元变成平均约 10000 美元,而且没有死亡/破产的可能性(当然,根据历史数据)。
不幸的是,我从未有机会实际使用这个系统,因为我的老板在不到 3 个月的时间里以传统方式交易就损失了近 100,000 美元,并且他失去了继续该项目的意愿。回想起来,我认为该系统会赚取巨额利润 - 不是因为我一定做对了任何事情,而是因为我产生的基因群体恰好偏向买入订单(而不是卖出订单)大约 5: 1 比率。正如我们 20/20 事后所知,1995 年之后市场略有上涨。
Not homework.
My first job as a professional programmer (1995) was writing a genetic-algorithm based automated trading system for S&P500 futures. The application was written in Visual Basic 3 [!] and I have no idea how I did anything back then, since VB3 didn't even have classes.
The application started with a population of randomly-generated fixed-length strings (the "gene" part), each of which corresponded to a specific shape in the minute-by-minute price data of the S&P500 futures, as well as a specific order (buy or sell) and stop-loss and stop-profit amounts. Each string (or "gene") had its profit performance evaluated by a run through 3 years of historical data; whenever the specified "shape" matched the historical data, I assumed the corresponding buy or sell order and evaluated the trade's result. I added the caveat that each gene started with a fixed amount of money and could thus potentially go broke and be removed from the gene pool entirely.
After each evaluation of a population, the survivors were cross-bred randomly (by just mixing bits from two parents), with the likelihood of a gene being selected as a parent being proportional to the profit it produced. I also added the possibility of point mutations to spice things up a bit. After a few hundred generations of this, I ended up with a population of genes that could turn $5000 into an average of about $10000 with no chance of death/brokeness (on the historical data, of course).
Unfortunately, I never got the chance to use this system live, since my boss lost close to $100,000 in less than 3 months trading the traditional way, and he lost his willingness to continue with the project. In retrospect, I think the system would have made huge profits - not because I was necessarily doing anything right, but because the population of genes that I produced happened to be biased towards buy orders (as opposed to sell orders) by about a 5:1 ratio. And as we know with our 20/20 hindsight, the market went up a bit after 1995.
我制作了一些生活在这个小世界里的小动物。他们有一个神经网络大脑,它接收来自世界的一些输入,输出是其他动作中运动的向量。他们的大脑就是“基因”。
该计划从一群具有随机大脑的随机生物开始。输入和输出神经元是静态的,但中间的神经元却不是。
环境中充满了食物和危险。食物会增加能量,当你有足够的能量时,你就可以交配。危险会减少能量,如果能量为0,他们就会死亡。
最终,这些生物进化到能够在世界各地移动、寻找食物并躲避危险。
然后我决定做一个小实验。我给了生物大脑一个称为“嘴”的输出神经元和一个称为“耳朵”的输入神经元。重新开始,惊讶地发现它们进化到最大化空间,每个生物都会留在各自的部分(食物是随机放置的)。他们学会了互相合作,而不是互相妨碍。总有例外。
然后我尝试了一些有趣的事情。我死去的生物会变成食物。试着猜猜发生了什么!进化出了两种类型的生物,一种是成群攻击的,另一种是高度回避的。
那么这里的教训是什么?沟通意味着合作。一旦你引入了一种伤害他人意味着你会得到一些东西的元素,那么合作就会被破坏。
我想知道这对自由市场和资本主义体系有何影响。我的意思是,如果企业可以伤害他们的竞争并逃脱惩罚,那么很明显他们会尽其所能来伤害竞争。
编辑:
我用 C++ 编写的,没有使用任何框架。编写了我自己的神经网络和遗传算法代码。埃里克,谢谢你说这是合理的。人们通常不相信 GA 的力量(尽管其局限性很明显),直到他们亲自使用它。 GA 很简单,但并不简单。
对于怀疑者来说,神经网络已被证明能够模拟任何函数,只要它们有不止一层。遗传算法是一种非常简单的方法来导航解决方案空间,寻找局部和潜在的全局最小值。将 GA 与神经网络结合起来,你就有了一个很好的方法来找到可以为一般问题找到近似解的函数。因为我们使用的是神经网络,所以我们正在优化某些输入的函数,而不是像其他人使用 GA 那样优化函数的某些输入
以下是生存示例的演示代码: http://www.mempko.com/darcs/neural/demos/eaters/
构建说明:
darcs clone --lazy http://www.mempko.com/darcs/neural/
cd Neural
cmake .
make
cd demos/eaters
./eaters
I made a little critters that lived in this little world. They had a neural network brain which received some inputs from the world and the output was a vector for movement among other actions. Their brains were the "genes".
The program started with a random population of critters with random brains. The inputs and output neurons were static but what was in between was not.
The environment contained food and dangers. Food increased energy and when you have enough energy, you can mate. The dangers would reduce energy and if energy was 0, they died.
Eventually the creatures evolved to move around the world and find food and avoid the dangers.
I then decided to do a little experiment. I gave the creature brains an output neuron called "mouth" and an input neuron called "ear". Started over and was surprised to find that they evolved to maximize the space and each respective creature would stay in its respective part (food was placed randomly). They learned to cooperate with each other and not get in each others way. There were always the exceptions.
Then i tried something interesting. I dead creatures would become food. Try to guess what happened! Two types of creatures evolved, ones that attacked like in swarms, and ones that were high avoidance.
So what is the lesson here? Communication means cooperation. As soon as you introduce an element where hurting another means you gain something, then cooperation is destroyed.
I wonder how this reflects on the system of free markets and capitalism. I mean, if businesses can hurt their competition and get away with it, then its clear they will do everything in their power to hurt the competition.
Edit:
I wrote it in C++ using no frameworks. Wrote my own neural net and GA code. Eric, thank you for saying it is plausible. People usually don't believe in the powers of GA (although the limitations are obvious) until they played with it. GA is simple but not simplistic.
For the doubters, neural nets have been proven to be able to simulate any function if they have more than one layer. GA is a pretty simple way to navigate a solution space finding local and potentially global minimum. Combine GA with neural nets and you have a pretty good way to find functions that find approximate solutions for generic problems. Because we are using neural nets, then we are optimizing the function for some inputs, not some inputs to a function as others are using GA
Here is the demo code for the survival example: http://www.mempko.com/darcs/neural/demos/eaters/
Build instructions:
darcs clone --lazy http://www.mempko.com/darcs/neural/
cd neural
cmake .
make
cd demos/eaters
./eaters
2004 年 1 月,飞利浦新显示技术公司联系了我,他们正在为首款商用电子墨水 Sony Librie 开发电子产品,该电子墨水仅在日本发布,比亚马逊 Kindle 和其他产品进入美国市场早了几年。一个欧洲。
飞利浦工程师遇到了一个大问题。在产品上市前几个月,在更改页面时屏幕上仍然出现重影。问题在于 200 名司机产生了静电场。每个驱动器都有一定的电压,必须设置在 0 到 1000 mV 之间或类似的值。但如果你改变其中之一,就会改变一切。
因此单独优化每个驱动器的电压是不可能的。可能的值组合数量达数十亿,特殊相机需要大约 1 分钟才能评估单个组合。工程师们尝试了许多标准优化技术,但没有任何方法可以接近。
首席工程师联系了我,因为我之前向开源社区发布了一个遗传编程库。他问 GP/GA 是否会提供帮助以及我是否可以参与其中。我做到了,我们一起工作了大约一个月,我根据合成数据编写和调整 GA 库,他将其集成到他们的系统中。然后,一个周末他们让它与真实的东西一起运行。
接下来的周一,我收到了他和他们的硬件设计师发来的这些热情洋溢的电子邮件,内容是没有人相信 GA 发现的惊人结果。就是这样。同年晚些时候,该产品上市。
我没有得到一分钱,但我得到了“吹牛”的权利。他们从一开始就说他们已经超出了预算,所以在开始工作之前我就知道这笔交易是什么。对于 GA 的应用来说,这是一个很棒的故事。 :)
In January 2004, I was contacted by Philips New Display Technologies who were creating the electronics for the first ever commercial e-ink, the Sony Librie, who had only been released in Japan, years before Amazon Kindle and the others hit the market in US an Europe.
The Philips engineers had a major problem. A few months before the product was supposed to hit the market, they were still getting ghosting on the screen when changing pages. The problem was the 200 drivers that were creating the electrostatic field. Each of these drivers had a certain voltage that had to be set right between zero and 1000 mV or something like this. But if you changed one of them, it would change everything.
So optimizing each driver's voltage individually was out of the question. The number of possible combination of values was in billions,and it took about 1 minute for a special camera to evaluate a single combination. The engineers had tried many standard optimization techniques, but nothing would come close.
The head engineer contacted me because I had previously released a Genetic Programming library to the open-source community. He asked if GP/GA's would help and if I could get involved. I did, and for about a month we worked together, me writing and tuning the GA library, on synthetic data, and him integrating it into their system. Then, one weekend they let it run live with the real thing.
The following Monday I got these glowing emails from him and their hardware designer, about how nobody could believe the amazing results the GA found. This was it. Later that year the product hit the market.
I didn't get paid one cent for it, but I got 'bragging' rights. They said from the beginning they were already over budget, so I knew what the deal was before I started working on it. And it's a great story for applications of GAs. :)
我使用 GA 来优化婚宴上的座位分配。 10桌可容纳80位客人。评估功能的基础是让人们保持约会,将有共同点的人放在一起,并将持极端相反观点的人放在不同的桌子上。
我运行了好几次。每次,我都有九张好桌子,还有一张全是奇怪的球。最后,我的妻子完成了座位分配。
我的旅行推销员优化器使用了一种新颖的染色体到行程的映射,这使得染色体的繁殖和突变变得微不足道,而且没有产生无效旅行的风险。
更新:因为有几个人问如何......
从一系列客人(或城市)开始,以某种任意但一致的顺序(例如按字母顺序排列)。将此称为参考解决方案。将客人的索引视为他/她的座位号。
我们没有尝试直接在染色体中编码这种顺序,而是编码用于将参考解决方案转换为新解决方案的指令。具体来说,我们将染色体视为要交换的数组中的索引列表。为了解码染色体,我们从参考解决方案开始并应用染色体指示的所有交换。交换数组中的两个条目始终会产生有效的解决方案:每个客人(或城市)仍然只出现一次。
因此,染色体可以随机生成、突变并与其他染色体杂交,并且始终会产生有效的解决方案。
I used a GA to optimize seating assignments at my wedding reception. 80 guests over 10 tables. Evaluation function was based on keeping people with their dates, putting people with something in common together, and keeping people with extreme opposite views at separate tables.
I ran it several times. Each time, I got nine good tables, and one with all the odd balls. In the end, my wife did the seating assignments.
My traveling salesman optimizer used a novel mapping of chromosome to itinerary, which made it trivial to breed and mutate the chromosomes without any risk of generating invalid tours.
Update: Because a couple people have asked how ...
Start with an array of guests (or cities) in some arbitrary but consistent ordering, e.g., alphabetized. Call this the reference solution. Think of a guest's index as his/her seat number.
Instead of trying to encode this ordering directly in the chromosome, we encode instructions for transforming the reference solution into a new solution. Specifically, we treat the chromosomes as lists of indexes in the array to swap. To get decode a chromosome, we start with the reference solution and apply all the swaps indicated by the chromosome. Swapping two entries in the array always results in a valid solution: every guest (or city) still appears exactly once.
Thus chromosomes can be randomly generated, mutated, and crossed with others and will always produce a valid solution.
我使用遗传算法(以及一些相关技术)来确定风险管理系统的最佳设置,该系统试图阻止金农使用被盗的信用卡来支付 MMO 费用。该系统将接收数千笔具有“已知”值(欺诈与否)的交易,并找出最佳的设置组合,以正确识别欺诈交易,而不会出现太多误报。
我们有一笔交易的几十个(布尔)特征的数据,每个特征都被赋予一个值并总计。如果总数高于阈值,则该交易属于欺诈。遗传算法将创建大量随机值集,根据已知数据集对其进行评估,选择得分最高的值(在欺诈检测和限制误报数量方面),然后将其中最好的几个值进行杂交每一代人都会产生新一代的候选人。经过一定代数后,得分最高的一组值被视为获胜者。
创建已知数据的语料库来进行测试是该系统的致命弱点。如果您等待退款,那么您在尝试对欺诈者做出回应时会落后几个月,因此有人必须手动审查大量交易来建立数据集,而不必等待太久。
这最终识别出了绝大多数的欺诈行为,但对于最容易发生欺诈的项目,其识别率无法完全低于 1%(考虑到 90% 的传入交易可能是欺诈行为,这已经做得相当不错了)。
我使用 perl 完成了这一切。在一台相当旧的 Linux 机器上运行该软件一次需要 1-2 小时才能运行(20 分钟通过 WAN 链路加载数据,其余时间用于处理数据)。任何给定代的大小都受到可用 RAM 的限制。我会一遍又一遍地运行它,并对参数进行细微的更改,寻找特别好的结果集。
总而言之,它避免了手动尝试调整数十个欺诈指标的相对值时出现的一些失误,并且始终提出比我手动创建的更好的解决方案。 AFAIK,它仍在使用(大约在我写完它三年后)。
I used genetic algorithms (as well as some related techniques) to determine the best settings for a risk management system that tried to keep gold farmers from using stolen credit cards to pay for MMOs. The system would take in several thousand transactions with "known" values (fraud or not) and figure out what the best combination of settings was to properly identify the fraudulent transactions without having too many false positives.
We had data on several dozen (boolean) characteristics of a transaction, each of which was given a value and totalled up. If the total was higher than a threshold, the transaction was fraud. The GA would create a large number of random sets of values, evaluate them against a corpus of known data, select the ones that scored the best (on both fraud detection and limiting the number of false positives), then cross breed the best few from each generation to produce a new generation of candidates. After a certain number of generations the best scoring set of values was deemed the winner.
Creating the corpus of known data to test against was the Achilles' heel of the system. If you waited for chargebacks, you were several months behind when trying to respond to the fraudsters, so someone would have to manually review large numbers of transactions to build up that corpus of data without having to wait too long.
This ended up identifying the vast majority of the fraud that came in, but couldn't quite get it below 1% on the most fraud-prone items (given that 90% of incoming transactions could be fraud, that was doing pretty well).
I did all this using perl. One run of the software on a fairly old linux box would take 1-2 hours to run (20 minutes to load data over a WAN link, the rest of the time spent crunching). The size of any given generation was limited by available RAM. I'd run it over and over with slight changes to the parameters, looking for an especially good result set.
All in all it avoided some of the gaffes that came with manually trying to tweak the relative values of dozens of fraud indicators, and consistently came up with better solutions than I could create by hand. AFAIK, it's still in use (about 3 years after I wrote it).