带 softmax 激活的神经网络

发布于 2024-08-29 19:44:23 字数 1002 浏览 3 评论 0原文

编辑:

一个更尖锐的问题: 在我的梯度下降中使用的 softmax 导数是什么?


这或多或少是一个课程的研究项目,我对神经网络的理解非常/相当有限,所以请耐心等待:)


我目前正在构建一个神经网络,尝试检查输入数据集并输出每个分类的概率/可能性(有 5 个不同的分类)。当然,所有输出节点的总和应该为 1。

目前,我有两层,并且我将隐藏层设置为包含 10 个节点。

我想出了两种不同类型的实现

  1. Logistic sigmoid 用于隐藏层激活,softmax 用于输出激活
  2. Softmax 用于隐藏层和输出激活

我使用梯度下降来查找局部最大值,以便调整隐藏节点的权重和输出节点' 重量。我确信我的 sigmoid 是正确的。我对softmax不太确定(或者我是否可以使用梯度下降),经过一番研究后,我找不到答案并决定自己计算导数并获得softmax'(x) = softmax (x) - softmax(x)^2(这将返回大小为 n 的列向量)。我还研究了MATLAB NN工具包,该工具包提供的softmax的导数返回一个大小为nxn的方阵,其中对角线与我手工计算的softmax'(x)重合;我不知道如何解释输出矩阵。

我以 0.001 的学习率和 1000 次反向传播迭代运行每个实现。然而,对于输入数据集的任何子集,我的神经网络对于所有五个输出节点返回 0.2(均匀分布)。

我的结论:

  • 我相当确定我的下降梯度做得不正确,但我不知道如何解决这个问题。
  • 也许我没有使用足够的隐藏节点
  • 也许我应该增加层数

任何帮助将不胜感激!

我正在使用的数据集可以在这里找到(已处理的克利夫兰): http://archive.ics.uci.edu/ml/datasets/Heart+疾病

edit:

A more pointed question:
What is the derivative of softmax to be used in my gradient descent?


This is more or less a research project for a course, and my understanding of NN is very/fairly limited, so please be patient :)


I am currently in the process of building a neural network that attempts to examine an input dataset and output the probability/likelihood of each classification (there are 5 different classifications). Naturally, the sum of all output nodes should add up to 1.

Currently, I have two layers, and I set the hidden layer to contain 10 nodes.

I came up with two different types of implementations

  1. Logistic sigmoid for hidden layer activation, softmax for output activation
  2. Softmax for both hidden layer and output activation

I am using gradient descent to find local maximums in order to adjust the hidden nodes' weights and the output nodes' weights. I am certain in that I have this correct for sigmoid. I am less certain with softmax (or whether I can use gradient descent at all), after a bit of researching, I couldn't find the answer and decided to compute the derivative myself and obtained softmax'(x) = softmax(x) - softmax(x)^2 (this returns an column vector of size n). I have also looked into the MATLAB NN toolkit, the derivative of softmax provided by the toolkit returned a square matrix of size nxn, where the diagonal coincides with the softmax'(x) that I calculated by hand; and I am not sure how to interpret the output matrix.

I ran each implementation with a learning rate of 0.001 and 1000 iterations of back propagation. However, my NN returns 0.2 (an even distribution) for all five output nodes, for any subset of the input dataset.

My conclusions:

  • I am fairly certain that my gradient of descent is incorrectly done, but I have no idea how to fix this.
  • Perhaps I am not using enough hidden nodes
  • Perhaps I should increase the number of layers

Any help would be greatly appreciated!

The dataset I am working with can be found here (processed Cleveland):
http://archive.ics.uci.edu/ml/datasets/Heart+Disease

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

岁月打碎记忆 2024-09-05 19:44:23

您使用的梯度实际上与平方误差相同:输出 - 目标。一开始这可能看起来令人惊讶,但诀窍在于最小化不同的误差函数:

softmax error

(- \sum^N_{n=1}\sum^K_{k=1} t_{kn} log(y_{kn}))

其中 log 是自然值对数,N 表示训练样本的数量,K 表示类的数量(以及输出层中的单位)。 t_kn 描述第 n 个训练示例中第 k 个类别的二进制编码(0 或 1)。 y_kn 相应的网络输出。

显示梯度是正确的可能是一个很好的练习,但我自己还没有这样做过。

针对你的问题:你可以通过数值微分来检查你的梯度是否正确。假设您有一个函数 f 以及 f 和 f' 的实现。那么以下内容应该成立:

softmax 的数值梯度

(f'(x) = \frac{f(x - \epsilon) - f(x + \epsilon)}{2\epsilon} + O(\epsilon^2))

The gradient you use is actually the same as with squared error: output - target. This might seem surprising at first, but the trick is that a different error function is minimized:

softmax error

(- \sum^N_{n=1}\sum^K_{k=1} t_{kn} log(y_{kn}))

where log is the natural logarithm, N depicts the number of training examples and K the number of classes (and thus units in the output layer). t_kn depicts the binary coding (0 or 1) of the k'th class in the n'th training example. y_kn the corresponding network output.

Showing that the gradient is correct might be a good exercise, I haven't done it myself, though.

To your problem: You can check whether your gradient is correct by numerical differentiation. Say you have a function f and an implementation of f and f'. Then the following should hold:

numerical gradient of the softmax

(f'(x) = \frac{f(x - \epsilon) - f(x + \epsilon)}{2\epsilon} + O(\epsilon^2))
柒七 2024-09-05 19:44:23

请访问sites.google.com/site/gatmkorn 了解开源欲望模拟程序。
对于 Windows 版本,/mydesire/neural 文件夹有几个 softmax 分类器,其中一些具有 softmax 特定的梯度下降算法。

在示例中,这对于简单的字符识别任务非常有效。

A另请参阅

Korn,GA:高级动态系统仿真,Wiley 2007

GAK

please look at sites.google.com/site/gatmkorn for the open-source Desire simulation program.
For the Windows version, /mydesire/neural folder has several softmax classifiers, some with softmax-specific gradient-descent algorithm.

In the examples, this works nicely for a simplemcharacter-recognition task.

ASee also

Korn, G.A.: Advanced dynamic-system Simulation, Wiley 2007

GAK

月隐月明月朦胧 2024-09-05 19:44:23

看链接:
http://www.youtube.com/watch?v=UOt3M5IuD5s
softmax 导数为:dyi/dzi= yi * (1.0 - yi);

look at the link:
http://www.youtube.com/watch?v=UOt3M5IuD5s
the softmax derivative is: dyi/dzi= yi * (1.0 - yi);

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文