了解多层感知器网络
我试图了解如何训练多层;然而,我在弄清楚如何确定合适的网络架构(即网络每层中的节点/神经元数量)时遇到了一些麻烦。
对于特定任务,我有四个输入源,每个输入源可以输入三种状态之一。我猜这意味着四个输入神经元会触发 0、1 或 2,但据我所知,输入应该保持二进制?
此外,我在选择隐藏层中的神经元数量时遇到一些问题。任何评论都会很棒。
谢谢。
I'm trying to understand how to train a multilayer; however, I'm having some trouble figuring out how to determine a suitable network architecture--i.e., number of nodes/neurons in each layer of the network.
For a specific task, I have four input sources that can each input one of three states. I guess that would mean four input neurons firing either 0, 1 or 2, but as far as I'm told, input should be kept binary?
Furthermore am I having some issues choosing on the amount of neurons in the hidden layer. Any comments would be great.
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
确定多层感知器可接受的网络结构实际上很简单。
输入层:有多少个特征/维度
你的数据——即有多少列
每个数据行。对此加一(对于
偏置节点),这就是
第一个的节点数(输入
层)。
输出层:您的 MLP 是否在“机器”中运行
模式或“回归”模式
(这里使用的“回归”
机器学习而不是
统计意义)——即,我的 MLP 是否
返回类标签或预测
价值?如果是后者,那么你的
输出层有一个节点。如果
前者,然后是你的输出层
具有相同数量的节点
类标签。例如,如果
你想要的结果是标记每个
例如“欺诈”或“不
欺诈”,这是两个类别标签,
因此,输出中有两个节点
隐藏层:在这两者之间(输入和
输出)显然是隐藏的
层。始终从单个开始
隐藏层。那么H\有多少个节点?这是一条经验法则:将隐藏层的(初始)大小设置为略大于输入层中的节点数的节点数。与少于输入层的节点相比,这种多余的容量将有助于您的数值优化例程(例如梯度下降)收敛。
总之,从网络架构的三层开始;第一个(输入)和最后一个(输出)的大小分别由您的数据和模型设计固定。隐藏层略大于输入层几乎总是一个好的开始设计。
所以在您的情况下,合适的网络结构如下:
输入层:5个节点-->隐藏层:7个节点-->输出层 :3个节点
Determining an acceptable Network structure for a multi-layer perceptron is actually straightforward.
Input Layer: How many features/dimensions are in
your data--ie, how many columns in
each data row. Add one to this (for
the bias node) and that is the
number of nodes for the first (input
layer).
Output Layer: Is your MLP running in 'machine'
mode or 'regression' mode
('regression' used here in the
machine learning rather than the
statistical sense)--ie, does my MLP
return a class label or a predicted
value? If the latter, then your
output layer has a single node. If
the former, then your output layer
has the same number of nodes as
class labels. For instance, if the
result you want is to label each
instance as either "fraud", or "not
fraud", that's two class labels,
therefore, two nodes in your output
layer.
Hidden Layer(s): In between these two (input and
output) are obviously the hidden
layers. Always start with a single
hidden layer. So H\how many nodes? Here's a rule of thumb: set the (initial) size of the hidden layer to some number of nodes just slightly greater than the number of nodes in the input layer. Compared with having fewer nodes than the input layer, this excess capacity will help your numerical optimization routine (eg, gradient descent) converge.
In sum, begin with three layers for your network architecture; the sizes of the first (input) and last (output are fixed by your data, and by your model design, respectively. A hidden layer just slightly larger than the input layer is nearly always a good design to begin.
So in your case, a suitable network structure to begin would be:
input layer: 5 nodes --> hidden layer: 7 nodes --> output layer: 3 nodes
我在几点上不同意上面道格的回答。
您有 4 个离散(3 路分类)输入。您应该(除非您有充分的理由不这样做)将其表示为 12 个二进制输入,对四个概念输入中的每一个输入使用 3 中的 1 编码。
因此,如果您输入是 [2,0,1,1] 那么您的网络应该给出:
0 0 1 1 0 0 0 1 0 0 1 0
如果您的网络实现需要手动偏置,那么您应该为偏置添加另一个始终开启的位,但大多数明智的神经网络实现不需要这样做。
尝试一些不同数量的隐藏单元。您不需要将自己限制为小于输入层大小的隐藏层大小,但如果将其设置得更大,则应小心调整权重,也许使用 L2 或 L1 权重衰减,甚至可能还进行早期停止训练中(当验证集上的错误停止改善时停止训练)。
I disagree with doug's answer above on a few points.
You have 4 discrete (3 way categorical) inputs. You should (unless you have a strong reason not to) represent that as 12 binary inputs using a 1-of-3 encoding for each of your four conceptual inputs.
So if you input is [2,0,1,1] then your network should be given:
0 0 1 1 0 0 0 1 0 0 1 0
If your network implementation requires a manual bias, then you should add another always on bit for the bias, but most sensible neural net implementations don't require that.
Try a few different numbers of hidden units. You don't need to restrict yourself to a hidden layer size smaller than the input layer size, but if you make it larger you should be careful to regularlize your weights, perhaps with L2 or L1 weight decay and maybe even also doing early-stopping in training (stop training when your error on a held out validation set stops improving).