为什么我的卷积神经网络经过一些迭代后返回NAN？

发布于 2025-02-01 18:06:19 字数 4785 浏览 4 评论 0原文

我目前正在Java编码自己的卷积神经网络。首先，我实施了完全连接的层，该层效果很好（它与MNIST数据集正确使用）。现在，我还实施了卷积层，并以一个非常简单的例子进行了尝试：

Network nn = new Network(new SGD(0.01), new CrossEntropy(), new Convolution(6, 3, 3, 2),  new Convolution(4, 2, 2, 1), new Flatten());
    ConvTrainPair[] trainPairs = new ConvTrainPair[] {
        new ConvTrainPair(Cube.ones(6, 6, 3), Vector.from(0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.8))
    };
    
    //System.out.println(nn.run(new Cube[] {Cube.ones(6, 6, 3)}));
    for(int e = 0; e < 100000000; ++e) {
        nn.train(trainPairs, 2);
        Matrix out = (Matrix) nn.run(new Cube[] {Cube.ones(6, 6, 3)});
        System.out.println("out: " + out);
    }

但是经过一些迭代后，网络只返回NAN，我不知道为什么。该网络只是一个简单的网络，具有一个卷积层（输入大小6和输入深度3和内核大小3和numkernels 2），另一个网络（带有输入大小4和输入深度2和内核大小） 2和Numkernels 1）和最终的平坦层将输出变成矩阵。我正在使用随机分析型和跨凝性损失。

我很确定错误必须在我的卷积层代码中：

public class Convolution extends Layer {
private Matrix[][] filters, filterGradients;
private Matrix[] bias, biasGradients; 
private Cube[] lastInputBatch;
private Trainer trainer;
private int trainerWW, trainerWH, trainerBL;

public Convolution(int inputSize, int inputDepth, int filterSize, int numFilters) {

    int outputSize = inputSize - filterSize + 1;
    this.trainerWW = numFilters * filterSize;
    this.trainerWH = inputDepth * filterSize;
    this.trainerBL = (numFilters + outputSize) * outputSize; 
    
    this.bias = new Matrix[numFilters];
    this.filters = new Matrix[numFilters][inputDepth];
    for(int i = 0; i < numFilters; ++i) {
        bias[i] = Matrix.random(outputSize, outputSize);
        
        for(int j = 0; j < inputDepth; ++j) {
            filters[i][j] = Matrix.random(filterSize, filterSize);
        }
    }
}



@Override
public void init(Optimizer optimizer) {
    this.trainer = new Trainer(trainerWW, trainerWH, trainerBL, optimizer);
}



@Override
public Object feedforward(Cube[] batch) {
    this.lastInputBatch = batch;
    Cube[] out = new Cube[batch.length];
    
    for(int b = 0; b < batch.length; ++b) {
        Cube current = batch[b];
        Matrix[] fMaps = new Matrix[filters.length];
        for(int i = 0; i < filters.length; ++i) {
            fMaps[i] = bias[i];
            for(int j = 0; j < filters[i].length; ++j) {
                fMaps[i].addE(crossCorrelate(current.data[j], filters[i][j]));
            }
        }
        
        out[b] = new Cube(fMaps);
    }
    
    return out;
}



@Override
public Object backward(Cube[] deltaBatch, boolean needsActivationGradient) {
    Cube[] inputDeltaBatch = new Cube[deltaBatch.length];
    for(int b = 0; b < deltaBatch.length; ++b) {
        Cube delta = deltaBatch[b];
        Cube lastInput = lastInputBatch[b];
        
        filterGradients = new Matrix[filters.length][filters[0].length];
        biasGradients = new Matrix[filterGradients.length];
        Matrix[] inputGradients = new Matrix[filters[0].length];
        for(int i = 0; i < filterGradients.length; ++i) {
            for(int j = 0; j < filterGradients[i].length; ++j) {
                Matrix filter = filters[i][j];
                filterGradients[i][j] = crossCorrelate(lastInput.data[j], delta.data[i]);
                if(i == 0) inputGradients[j] = new Matrix(lastInput.width(), lastInput.height());
                inputGradients[j].addE(crossCorrelate(delta.data[i].padding(filter.w - 1), filter.rotate180()));
            }
            biasGradients[i] = delta.data[i];
        }
        
        inputDeltaBatch[b] = new Cube(inputGradients);
    }
    
    return inputDeltaBatch;
}

public static Matrix crossCorrelate(Matrix input, Matrix kernel) {
    int nmW = input.w - kernel.w + 1;
    int nmH = input.h - kernel.h + 1;
    Matrix nm = new Matrix(nmW, nmH);
    
    for(int i = 0; i < nmW; ++i) {
        for(int j = 0; j < nmH; ++j) {
            for(int a = 0; a < kernel.w; ++a) {
                for(int b = 0; b < kernel.h; ++b) {
                    nm.e[i][j] += input.e[i + a][j + b] * kernel.e[a][b];
                }
            }
        }
    }
    
    return nm;
}



@Override
public void optimize(int episode) {
    double lr = 0.001;
    for(int i = 0; i < filters.length; ++i)  {
        for(int j = 0; j < filters[i].length; ++j) {
            //System.out.println(filters[i][j]);
            filters[i][j].subE(filterGradients[i][j].mul(lr));
        }
    }
    
    
    for(int i = 0; i < bias.length; ++i) {
        bias[i].subE(biasGradients[i].mul(lr));
    }
    
    //System.out.println();
    
} }

我已经尝试使用一个解决问题的卷积层，但我不明白为什么。我还试图改变学习率和损失功能，但我没有解决问题。我还插入了两个卷积层和一个扁平层之后的一个激活功能。有了Sigmoid功能，它不再返回Nan，但似乎也没有通过训练来学到任何东西。使用Tanh和SoftMax功能，它仍然返回NAN。

如果您知道问题可能是什么，我会非常感激。

先感谢您！

原文

I'm currently coding my own convolutional neural network in Java. First I implemented the fully-connected-layers which worked perfectly fine (it worked correctly with the MNIST dataset).
Now I have also implemented the convolutional layer and tried it with a really simple example:

Network nn = new Network(new SGD(0.01), new CrossEntropy(), new Convolution(6, 3, 3, 2),  new Convolution(4, 2, 2, 1), new Flatten());
    ConvTrainPair[] trainPairs = new ConvTrainPair[] {
        new ConvTrainPair(Cube.ones(6, 6, 3), Vector.from(0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.8))
    };
    
    //System.out.println(nn.run(new Cube[] {Cube.ones(6, 6, 3)}));
    for(int e = 0; e < 100000000; ++e) {
        nn.train(trainPairs, 2);
        Matrix out = (Matrix) nn.run(new Cube[] {Cube.ones(6, 6, 3)});
        System.out.println("out: " + out);
    }

But somehow after a few iterations the network is only returning NaN and I have no idea why. The network is just a simple network with one convolutional-layer (with input-size 6 and input-depth 3 and kernel-size 3 and numKernels 2), another one (with input-size 4 and input-depth 2 and kernel-size 2 and numKernels 1) and a final Flatten layer which turns the output into a matrix. I'm using stochastic-gradient-descent and the cross-entropy loss.

I'm quite sure that the error must be in my code of the convolution-layer:

public class Convolution extends Layer {
private Matrix[][] filters, filterGradients;
private Matrix[] bias, biasGradients; 
private Cube[] lastInputBatch;
private Trainer trainer;
private int trainerWW, trainerWH, trainerBL;

public Convolution(int inputSize, int inputDepth, int filterSize, int numFilters) {

    int outputSize = inputSize - filterSize + 1;
    this.trainerWW = numFilters * filterSize;
    this.trainerWH = inputDepth * filterSize;
    this.trainerBL = (numFilters + outputSize) * outputSize; 
    
    this.bias = new Matrix[numFilters];
    this.filters = new Matrix[numFilters][inputDepth];
    for(int i = 0; i < numFilters; ++i) {
        bias[i] = Matrix.random(outputSize, outputSize);
        
        for(int j = 0; j < inputDepth; ++j) {
            filters[i][j] = Matrix.random(filterSize, filterSize);
        }
    }
}



@Override
public void init(Optimizer optimizer) {
    this.trainer = new Trainer(trainerWW, trainerWH, trainerBL, optimizer);
}



@Override
public Object feedforward(Cube[] batch) {
    this.lastInputBatch = batch;
    Cube[] out = new Cube[batch.length];
    
    for(int b = 0; b < batch.length; ++b) {
        Cube current = batch[b];
        Matrix[] fMaps = new Matrix[filters.length];
        for(int i = 0; i < filters.length; ++i) {
            fMaps[i] = bias[i];
            for(int j = 0; j < filters[i].length; ++j) {
                fMaps[i].addE(crossCorrelate(current.data[j], filters[i][j]));
            }
        }
        
        out[b] = new Cube(fMaps);
    }
    
    return out;
}



@Override
public Object backward(Cube[] deltaBatch, boolean needsActivationGradient) {
    Cube[] inputDeltaBatch = new Cube[deltaBatch.length];
    for(int b = 0; b < deltaBatch.length; ++b) {
        Cube delta = deltaBatch[b];
        Cube lastInput = lastInputBatch[b];
        
        filterGradients = new Matrix[filters.length][filters[0].length];
        biasGradients = new Matrix[filterGradients.length];
        Matrix[] inputGradients = new Matrix[filters[0].length];
        for(int i = 0; i < filterGradients.length; ++i) {
            for(int j = 0; j < filterGradients[i].length; ++j) {
                Matrix filter = filters[i][j];
                filterGradients[i][j] = crossCorrelate(lastInput.data[j], delta.data[i]);
                if(i == 0) inputGradients[j] = new Matrix(lastInput.width(), lastInput.height());
                inputGradients[j].addE(crossCorrelate(delta.data[i].padding(filter.w - 1), filter.rotate180()));
            }
            biasGradients[i] = delta.data[i];
        }
        
        inputDeltaBatch[b] = new Cube(inputGradients);
    }
    
    return inputDeltaBatch;
}

public static Matrix crossCorrelate(Matrix input, Matrix kernel) {
    int nmW = input.w - kernel.w + 1;
    int nmH = input.h - kernel.h + 1;
    Matrix nm = new Matrix(nmW, nmH);
    
    for(int i = 0; i < nmW; ++i) {
        for(int j = 0; j < nmH; ++j) {
            for(int a = 0; a < kernel.w; ++a) {
                for(int b = 0; b < kernel.h; ++b) {
                    nm.e[i][j] += input.e[i + a][j + b] * kernel.e[a][b];
                }
            }
        }
    }
    
    return nm;
}



@Override
public void optimize(int episode) {
    double lr = 0.001;
    for(int i = 0; i < filters.length; ++i)  {
        for(int j = 0; j < filters[i].length; ++j) {
            //System.out.println(filters[i][j]);
            filters[i][j].subE(filterGradients[i][j].mul(lr));
        }
    }
    
    
    for(int i = 0; i < bias.length; ++i) {
        bias[i].subE(biasGradients[i].mul(lr));
    }
    
    //System.out.println();
    
} }

I already tried to only use one convolutional-layer which solves the problem, but I don't understand why. I also tried to change the learning rate and the loss-function, but I didn't solve the problem. I also inserted some activation-functions between the two convolutional-layers and one after the flatten-layer. With the sigmoid-function it didn't return NaN anymore, but it also didn't seem to learn anything by training. With the tanH and the softmax function it still returned Nan.

If you have any idea what the problem might be, I would be very grateful.

Thank you in advance!

分享到QQ

分享到微博