当前位置：文江博客话题详情

通过神经网络和/或强化学习增强我的 GA

发布于 2024-08-25 21:51:54 字数 1380 浏览 5 评论 0 原文

正如我在前面的问题中提到的，我正在编写一个迷宫求解应用程序来帮助我了解更多理论 CS 主题，在经历了一些麻烦之后，我得到了一个遗传算法，它可以按顺序演化出一组规则（由布尔值处理）通过迷宫找到一个好的解决方案。

话虽这么说，单独的 GA 还可以，但我想用神经网络来加强它，即使我没有神经网络的真正工作知识（没有正式的 CS 理论教育）。在阅读了一些有关该主题的内容后，我发现神经网络可用于训练基因组以改善结果。假设我有一个基因组（基因组），例如

1 0 0 1 0 1 0 1 0 1 1 1 0 0...

我如何使用神经网络（我假设是 MLP？）来训练和改进我的基因组？

除此之外，因为我对神经网络一无所知，所以我一直在考虑使用我的迷宫矩阵（二维数组）来实现某种形式的强化学习，尽管我有点坚持以下算法对我的要求：

（来自 http://people.revoledu.com/kardi/tutorial /ReinforcementLearning/Q-Learning-Algorithm.htm）

1.  Set parameter , and environment reward matrix R
   2. Initialize matrix Q as zero matrix
   3. For each episode:
          * Select random initial state
          * Do while not reach goal state
                o Select one among all possible actions for the current state
                o Using this possible action, consider to go to the next state
                o Get maximum Q value of this next state based on all possible actions
                o Compute
                o Set the next state as the current state

  End Do

  End For

对我来说最大的问题是实现奖励矩阵 R 以及 Q 矩阵到底是什么，并获取 Q 值。我为迷宫使用多维数组，并为每次移动使用枚举状态。这将如何用在 Q-Learning 算法中？

如果有人可以帮助解释我需要做什么来实现以下内容，最好是用 Java，尽管 C# 也很好，可能还有一些源代码示例，我们将不胜感激。

原文

As I have mentioned in previous questions I am writing a maze solving application to help me learn about more theoretical CS subjects, after some trouble I've got a Genetic Algorithm working that can evolve a set of rules (handled by boolean values) in order to find a good solution through a maze.

That being said, the GA alone is okay, but I'd like to beef it up with a Neural Network, even though I have no real working knowledge of Neural Networks (no formal theoretical CS education). After doing a bit of reading on the subject I found that a Neural Network could be used to train a genome in order to improve results. Let's say I have a genome (group of genes), such as

1 0 0 1 0 1 0 1 0 1 1 1 0 0...

How could I use a Neural Network (I'm assuming MLP?) to train and improve my genome?

In addition to this as I know nothing about Neural Networks I've been looking into implementing some form of Reinforcement Learning, using my maze matrix (2 dimensional array), although I'm a bit stuck on what the following algorithm wants from me:

(from http://people.revoledu.com/kardi/tutorial/ReinforcementLearning/Q-Learning-Algorithm.htm)

1.  Set parameter , and environment reward matrix R
   2. Initialize matrix Q as zero matrix
   3. For each episode:
          * Select random initial state
          * Do while not reach goal state
                o Select one among all possible actions for the current state
                o Using this possible action, consider to go to the next state
                o Get maximum Q value of this next state based on all possible actions
                o Compute
                o Set the next state as the current state

  End Do

  End For

The big problem for me is implementing a reward matrix R and what a Q matrix exactly is, and getting the Q value. I use a multi-dimensional array for my maze and enum states for every move. How would this be used in a Q-Learning algorithm?

If someone could help out by explaining what I would need to do to implement the following, preferably in Java although C# would be nice too, possibly with some source code examples it'd be appreciated.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

故事未完 2024-09-01 21:51:54

正如一些评论中指出的那样，您的问题确实涉及大量的背景知识和主题，而这些背景知识和主题在 stackoverflow 上几乎无法雄辩地涵盖。但是，我们在这里可以尝试建议解决您的问题的方法。

首先：你的 GA 是做什么的？我看到一组二进制值；这些是什么？我认为它们是：

坏：一系列“右转”和“左转”指令。为什么这样不好？因为你基本上是在随机地、强力地尝试解决你的问题。你不是在进化基因型：你是在完善随机猜测。
更好：每个基因（基因组中的位置）代表将在表型中表达的特征。基因组和表型之间不应该存在一对一的映射！

让我举个例子：我们的大脑中有 10^13 个左右的神经元。但我们只有大约 10^9 个基因（是的，这不是一个精确的值，请耐心等待）。这告诉我们什么？我们的基因型并不编码每个神经元。我们的基因组编码蛋白质，然后这些蛋白质将构成我们身体的组成部分。

因此，进化通过选择表型的特征直接作用于基因型。如果我每只手都有 6 个手指，如果这会让我成为一个更好的程序员，让我有更多的孩子，因为我在生活中更成功，那么我的基因型将被进化选择，因为它包含 >能力让我拥有更健康的身体（是的，这里有一个双关语，考虑到这里大多数人的平均极客与可重复性比率）。

现在，想想你的 GA：你想要实现什么目标？您确定不断发展的规则会有帮助吗？换句话说，你在迷宫中会表现如何？对你有帮助的最成功的事情是什么：拥有不同的身体，还是拥有正确的逃生之路的记忆？也许您可能想重新考虑您的基因型并让它编码记忆能力。也许可以在基因型中编码可以存储多少数据，以及代理访问数据的速度有多快，然后根据他们走出迷宫的速度来衡量适应性。
另一种（较弱的）方法可能是对代理用来决定去哪里的规则进行编码。最重要的信息是，对特征进行编码，一旦表达出来，就可以通过适应度来选择。

现在，谈谈神经网络问题。要记住的一件事是，神经网络是过滤器。他们收到输入。对其执行操作并返回输出。这个输出是什么？也许你只需要区分真/假条件；例如，一旦你将迷宫地图输入神经网络，它就可以告诉你是否可以走出迷宫。你会怎么做这样的事？您需要对数据进行正确的编码。

这是神经网络的关键点：输入数据必须正确编码。通常人们会对其进行标准化，也许对其进行缩放，也许您可以对其应用西格玛函数以避免值太大或太小；这些是处理错误测量和性能的细节。您现在需要了解什么是神经网络，以及不能用它做什么。

现在解决你的问题。您提到您也想使用神经网络：

使用神经网络来指导代理并
使用遗传算法来进化神经网络参数怎么样？

换句话来说：

假设你有一个机器人：你的神经网络正在控制左右轮，作为输入，它接收下一堵墙的距离以及它到目前为止已经行驶了多少距离（这只是一个例子），
你首先生成随机基因型
使基因型转变为表型：第一个基因是网络敏感性；第二个基因编码学习率；第三个基因..等等
现在你有了一个神经网络，运行模拟
看看它如何执行
生成第二个随机基因型，进化第二个神经网络
看看第二个个体如何执行
以获得最好的个体，然后要么改变它的基因型或者将其与失败者
重复

重新组合，这里有关于此事的优秀阅读：Inman Harvey 微生物 GA。

我希望我能为您提供有关此类问题的一些见解。神经网络和遗传算法并不是解决所有问题的灵丹妙药。在某些方面它们可以做很多事情，但在另一些方面它们只是错误的工具。（仍然！）取决于我们来获得最好的，为此我们必须很好地理解它们。

玩得开心！很高兴知道这些事情，让日常生活变得更有趣:)

As noted in some comments, your question indeed involves a large set of background knowledge and topics that hardly can be eloquently covered on stackoverflow. However, what we can try here is suggest approaches to get around your problem.

First of all: what does your GA do? I see a set of binary values; what are they? I see them as either:

bad: a sequence of 'turn right' and 'turn left' instructions. Why is this bad? Because you're basically doing a random, brute-force attempt at solving your problem. You're not evolving a genotype: you're refining random guesses.
better: every gene (location in the genome) represents a feature that will be expressed in the phenotype. There should not be a 1-to-1 mapping between genome and phenotype!

Let me give you an example: in our brain there are 10^13ish neurons. But we have only around 10^9 genes (yes, it's not an exact value, bear with me for a second). What does this tell us? That our genotype does not encode every neuron. Our genome encodes the proteins that will then go and make the components of our body.

Hence, evolution works on the genotype directly by selecting features of the phenotype. If I were to have 6 fingers on each hand and if that would made me a better programmer, making me have more kids because I'm more successful in life, well, my genotype would then be selected by evolution because it contains the capability to give me a more fit body (yes, there is a pun there, given the average geekiness-to-reproducibily ratio of most people around here).

Now, think about your GA: what is that you are trying to accomplish? Are you sure that evolving rules would help? In other words -- how would you perform in a maze? What is the most successful thing that can help you: having a different body, or having a memory of the right path to get out? Perhaps you might want to reconsider your genotype and have it encode memorization abilities. Maybe encode in the genotype how much data can be stored, and how fast can your agents access it -- then measure fitness in terms of how fast they get out of the maze.
Another (weaker) approach could be to encode the rules that your agent uses to decide where to go. The take-home message is, encode features that, once expressed, can be selected by fitness.

Now, to the neural network issue. One thing to remember is that NNs are filters. They receive an input. perform operations on it, and return an output. What is this output? Maybe you just need to discriminate a true/false condition; for example, once you feed a maze map to a NN, it can tell you if you can get out from the maze or not. How would you do such a thing? You will need to encode the data properly.

This is the key point about NNs: your input data must be encoded properly. Usually people normalize it, maybe scale it, perhaps you can apply a sigma function to it to avoid values that are too large or too small; those are details that deal with error measures and performance. What you need to understand now is what a NN is, and what you cannot use it for.

To your problem now. You mentioned you want to use NNs as well: what about,

using a neural network to guide the agent, and
using a genetic algorithm to evolve the neural network parameters?

Rephrased like so:

let's suppose you have a robot: your NN is controlling the left and right wheel, and as input it receives the distance of the next wall and how much it has traveled so far (it's just an example)
you start by generating a random genotype
make the genotype into a phenotype: the first gene is the network sensitivity; the second gene encodes the learning ratio; the third gene.. so on and so forth
now that you have a neural network, run the simulation
see how it performs
generate a second random genotype, evolve second NN
see how this second individual performs
get the best individual, then either mutate its genotype or recombinate it with the loser
repeat

there is an excellent reading on the matter here: Inman Harvey Microbial GA.

I hope I did you some insight on such issues. NNs and GA are no silver bullet to solve all problems. In some they can do very much, in others they are just the wrong tool. It's (still!) up to us to get the best one, and to do so we must understand them well.

Have fun in it! It's great to know such things, makes everyday life a bit more entertaining :)

回复收藏 0 原文