我编写了自己的自定义性能函数,即经过一些修改的交叉熵函数,称为增强交叉熵函数。
我的性能函数本身是两个函数的和:交叉熵函数 F 和惩罚函数 P,公式如下:
其中 B 和向量 e1 和 e2 只是一些常数,w
是权重矩阵(i
表示隐藏层神经元,j
表示输入层神经元) 。
我已经实现了 dy 和 dx 导数,但不太确定 dx 导数(其中 x 是 getx 函数的结果 - 它包含所有权重和偏差信息)。我假设权重 wij
的性能函数的 dx 导数将等于惩罚函数的导数:
然后我开始使用 trainbfg 函数训练我的神经网络,发现它没有学到任何东西。消息是“线性搜索没有找到新的最小值”。来自trainbfg的描述:
每个变量根据以下条件进行调整:X = X + a*dX;
其中 dX 是搜索方向。参数a选择为
最小化沿搜索方向的性能。
事实证明,默认搜索函数 srchbac(线性搜索)将参数 a
计算为 0。我认为这与我的性能函数被错误地实现有关,因为当我将 mse 设置为性能函数时,a
被正确计算。
使用srchbac
函数查找新最小值时出现问题的原因是什么?只是为了知道我应该去哪里寻找,但第二天我什么也没找到。
编辑:
x 向量首先由输入隐藏连接的权重值组成,然后是其余的偏差和权重。我使用以下公式计算权重向量的 dx 导数:
res = 2 .* E1 .* b .* W ./( 1 + b .* W.^2).^2 + 2 .* E2 .* W ;
并将其余值设置为 0(以便 res
与 x 向量具有相同的长度)。
I have written my own custom performance function, that is a cross entropy function with some modifications, called augmented cross entropy function.
My performance function itselft is a sum of two functions: cross entropy function F and a penalty function P, the formula given below:
where B and vectors e1 and e2 are just some constants and w
is a weight matrix (i
for hidden layer neurons, j
for input layer neurons).
I've implemented dy and dx derivatives, not being very sure about the dx derivative (where x is a result of getx function - it holds all weight and bias information). I assumed that the dx derivative of my performance function for a weight wij
will be equal to derivative of the penalty function:
Then I started training my neural network with trainbfg function and found out it does not learn anything. Message was "Line search did not find new minimum". From trainbfg description:
Each variable is adjusted according to the following: X = X + a*dX;
where dX is the search direction. The parameter a is selected to
minimize the performance along the search direction.
It turned out that parameter a
is always calculated as 0 by the default search function, srchbac (line search). I assume it has something to do with my performance function being wrongfully implemented, because when I set mse as the performance function, a
is calculated properly.
What is the reason of the problems during locating a new minimum by the srchbac
function? Just to know where I should look for as for a second day I found nothing.
Edit:
The x vector consists of input-hidden connections' weight values first and then the rest biases and weights. I calculate the dx derivative of the weights vector with the following formula:
res = 2 .* E1 .* b .* W ./( 1 + b .* W.^2).^2 + 2 .* E2 .* W ;
and the rest of the values I set to 0 (so that res
has the same length as the x vector).
发布评论