循环神经网络与强化学习一起使用时有什么用途?
我确实知道具有反向传播的前馈多层神经网络与强化学习一起使用,以帮助它概括我们的代理所做的动作。也就是说,如果我们有一个很大的状态空间,我们可以做一些动作,它们将有助于泛化整个状态空间。
相反,循环神经网络会做什么?它们一般用于什么任务?
I do know that feedforward multi-layer neural networks with backprop are used with Reinforcement Learning as to help it generalize the actions our agent does. This is, if we have a big state space, we can do some actions, and they will help generalize over the whole state space.
What do recurrent neural networks do, instead? To what tasks are they used for, in general?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
循环神经网络,简称 RNN(尽管请注意,RNN 在文献中经常用于指定随机神经网络,它实际上是循环 NN 的一个特例),如下具有非常不同的“风味”,这导致它们表现出不同的行为和特征。然而,总的来说,这些不同的行为和特征植根于单个神经元的[反馈]输入的可用性。此类反馈来自网络的其他部分,无论是本地还是远程,来自同一层(在某些情况下包括“自身”),甚至来自不同层(*)。被视为“正常”的反馈信息输入神经元,然后可以至少部分影响其输出。
与在前馈网络的学习阶段使用的反向传播不同,其目的是微调各种[仅前馈]连接的相对权重, RNN 中的反馈构成了它们所连接的神经元的真实输入。
反馈的用途之一是使网络对输入中的噪声和其他缺陷更具弹性(即整个网络的输入)。其原因是,除了“直接”与网络输入相关的输入(前馈网络中存在的输入类型)之外,神经元还具有有关其他神经元正在“思考”的信息。这些额外的信息然后导致赫布学习,即神经元的想法[通常]一起开火应该“鼓励”彼此开火。实际上,来自“类似放电”的邻居神经元(或不那么邻居)的额外输入可能会促使神经元放电,即使其非反馈输入可能不会放电(或放电不太强烈,取决于网络类型)。
这种对输入缺陷的恢复能力的一个例子是联想记忆,这是 RNN 的常见应用。这个想法是使用反馈信息来“填补空白”。
反馈的另一个相关但独特的用途是抑制信号,通过该信号,给定的神经元可以了解到,虽然其所有其他输入都会促使其激发,但来自网络其他部分的特定反馈输入通常指示以某种方式,其他输入不可信(在这个特定的上下文中)。
反馈的另一个极其重要的用途是,在某些架构中,它可以向系统引入时间元素。特定的[反馈]输入可能不会过多地指示神经元[现在]“思考”什么,而是“提醒”神经元说,两个周期前(无论周期可能代表什么),网络的状态(或其中之一)它的子状态)是“X”。这种“记住”[通常]最近的过去的能力是对输入中的噪声具有弹性的另一个因素,但其主要兴趣可能是将“预测”引入学习过程。这些延时输入可能被视为来自网络其他部分的预测:“我听到走廊里有脚步声,预计会听到门铃[或钥匙拖曳的声音]”。
(*) 顺便说一句,规定允许的连接(无论是反馈还是前馈)的“规则”具有如此广泛的自由度,解释了为什么有这么多不同的 RNN 架构及其变体)。存在如此多种不同架构的另一个原因是,RNN 的特征之一是,与前馈模型相比,它们在数学或其他方面都不容易处理。因此,在数学洞察力或简单的试错方法的驱动下,人们正在尝试许多不同的可能性。
这并不是说反馈网络完全是黑匣子,事实上一些 RNN,例如 Hopfield Networks 很好理解。只是数学通常更复杂(至少对我来说;-))
我认为上述内容一般(太一般了!)解决了吞噬极乐世界(OP)的“< em>RNN 做什么”,以及“它们用于的一般任务”。为了补充这些信息,这里有一个关于 RNN 应用的不完整且非正式的调查。收集这样一个列表的困难是多方面的:
无论如何,这是列表
还有许多与 RNN 的时间维度相关的应用(FF 网络通常找不到的另一个领域)
Recurrent Neural Networks, RNN for short (although beware that RNN is often used in the literature to designate Random Neural Networks, which effectively are a special case of Recurrent NN), come in very different "flavors" which causes them to exhibit various behaviors and characteristics. In general, however these many shades of behaviors and characteristics are rooted in the availability of [feedback] input to individual neurons. Such feedback comes from other parts of the network, be it local or distant, from the same layer (including in some cases "self"), or even on different layers (*). Feedback information it treated as "normal" input the neuron and can then influence, at least in part, its output.
Unlike back propagation which is used during the learning phase of a Feed-forward Network for the purpose of fine-tuning the relative weights of the various [Feedfoward-only] connections, FeedBack in RNNs constitute true a input to the neurons they connect to.
One of the uses of feedback is to make the network more resilient to noise and other imperfections in the input (i.e. input to the network as a whole). The reason for this is that in addition to inputs "directly" pertaining to the network input (the types of input that would have been present in a Feedforward Network), neurons have the information about what other neurons are "thinking". This extra info then leads to Hebbian learning, i.e. the idea that neurons that [usually] fire together should "encourage" each other to fire. In practical terms this extra input from "like-firing" neighbor neurons (or no-so neighbors) may prompt a neuron to fire even though its non-feedback inputs may have been such that it would have not fired (or fired less strongly, depending on type of network).
An example of this resilience to input imperfections is with associative memory, a common employ of RNNs. The idea is to use the feeback info to "fill-in the blanks".
Another related but distinct use of feedback is with inhibitory signals, whereby a given neuron may learn that while all its other inputs would prompt it to fire, a particular feedback input from some other part of the network typically indicative that somehow the other inputs are not to be trusted (in this particular context).
Another extremely important use of feedback, is that in some architectures it can introduce a temporal element to the system. A particular [feedback] input may not so much instruct the neuron of what it "thinks" [now], but instead "remind" the neuron that say, two cycles ago (whatever cycles may represent), the network's state (or one of its a sub-states) was "X". Such ability to "remember" the [typically] recent past is another factor of resilience to noise in the input, but its main interest may be in introducing "prediction" into the learning process. These time-delayed input may be seen as predictions from other parts of the network: "I've heard footsteps in the hallway, expect to hear the door bell [or keys shuffling]".
(*) BTW such a broad freedom in the "rules" that dictate the allowed connections, whether feedback or feed-forward, explains why there are so many different RNN architectures and variations thereof). Another reason for these many different architectures is that one of the characteristics of RNN is that they are not readily as tractable, mathematically or otherwise, compared with the feed-forward model. As a result, driven by mathematical insight or plain trial-and-error approach, many different possibilities are being tried.
This is not to say that feedback network are total black boxes, in fact some of the RNNs such as the Hopfield Networks are rather well understood. It's just that the math is typically more complicated (at least to me ;-) )
I think the above, generally (too generally!), addresses devoured elysium's (the OP) questions of "what do RNN do instead", and the "general tasks they are used for". To many complement this information, here's an incomplete and informal survey of applications of RNNs. The difficulties in gathering such a list are multiple:
Anyway, here's the list
Also there are a lots of applications associated with the temporal dimension of the RNNs (another area where FF networks would typically not be found)
基本强化学习框架中有一个假设,即您的状态/行动/奖励序列是马尔可夫决策过程。这基本上意味着您不需要记住本集中之前状态的任何信息来做出决定。
但这显然不适用于所有问题。有时您确实需要记住一些最近发生的事情才能做出明智的决定。有时您可以明确地将需要记住的内容构建到状态信号中,但总的来说,我们希望我们的系统能够学习它需要记住的内容。这称为部分可观察马尔可夫决策过程(POMDP),有多种方法可以用来处理它。一种可能的解决方案是使用循环神经网络,因为它们将先前时间步骤的细节合并到当前决策中。
There is an assumption in the basic Reinforcement Learning framework that your state/action/reward sequence is a Markov Decision Process. That basically means that you do not need to remember any information about previous states from this episode to make decisions.
But this is obviously not true for all problems. Sometimes you do need to remember some recent things to make informed decisions. Sometimes you can explicitly build the things that need to be remembered into the state signal, but in general we'd like our system to learn what it needs to remember. This is called a Partially Observable Markov Decision Process (POMDP), and there are a variety of methods used to deal with it. One possibly solution is to use a recurrent neural network, since they incorporate details from previous time steps into the current decision.