神经网络总是为任何输入产生相同/相似的输出

发布于 2024-10-08 16:55:08 字数 1436 浏览 1 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(13

画尸师 2024-10-15 16:55:08

我也遇到过类似的问题,但能够通过更改以下内容来解决:

  • 将问题缩小到可管理的大小。我首先尝试了太多的输入,以及太多的隐藏层单元。一旦我缩小了问题的规模,我就可以看看较小问题的解决方案是否有效。这也是有效的,因为当它缩小时,计算权重的时间显着下降,所以我可以尝试许多不同的事情而无需等待。
  • 确保你有足够的隐藏单位。这对我来说是一个主要问题。我有大约 900 个输入连接到隐藏层中的约 10 个单元。这太小了,无法快速收敛。但如果我添加额外的单位,也会变得非常慢。减少输入数量有很大帮助。
  • 更改激活函数及其参数。我最初使用 tanh。我尝试了其他函数:sigmoid、归一化 sigmoid、高斯等。我还发现更改函数参数以使函数更陡或更浅会影响网络收敛的速度。
  • 更改学习算法参数。尝试不同的学习率(0.01 到 0.9)。如果您的算法支持(0.1 到 0.9),也可以尝试不同的动量参数。

希望这对那些在 Google 上找到此主题的人有所帮助!

I've had similar problems, but was able to solve by changing these:

  • Scale down the problem to manageable size. I first tried too many inputs, with too many hidden layer units. Once I scaled down the problem, I could see if the solution to the smaller problem was working. This also works because when it's scaled down, the times to compute the weights drop down significantly, so I can try many different things without waiting.
  • Make sure you have enough hidden units. This was a major problem for me. I had about 900 inputs connecting to ~10 units in the hidden layer. This was way too small to quickly converge. But also became very slow if I added additional units. Scaling down the number of inputs helped a lot.
  • Change the activation function and its parameters. I was using tanh at first. I tried other functions: sigmoid, normalized sigmoid, Gaussian, etc.. I also found that changing the function parameters to make the functions steeper or shallower affected how quickly the network converged.
  • Change learning algorithm parameters. Try different learning rates (0.01 to 0.9). Also try different momentum parameters, if your algo supports it (0.1 to 0.9).

Hope this helps those who find this thread on Google!

掀纱窥君容 2024-10-15 16:55:08

所以我意识到这对于原始帖子来说已经非常晚了,但我遇到了这个,因为我遇到了类似的问题,并且这里发布的原因都没有涵盖我的情况的错误。

我正在研究一个简单的回归问题,但每次我训练网络时,它都会收敛到一个点,为每个输入提供相同的输出(或有时一些不同的输出)。我研究了学习率、隐藏层/节点的数量、优化算法等,但没有什么区别。即使当我查看一个极其简单的示例时,尝试预测两个不同输入 (1d) 的输出 (1d):

import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F

class net(nn.Module):
    def __init__(self, obs_size, hidden_size):
        super(net, self).__init__()
        self.fc = nn.Linear(obs_size, hidden_size)
        self.out = nn.Linear(hidden_size, 1)

    def forward(self, obs):
        h = F.relu(self.fc(obs))
        return self.out(h)

inputs = np.array([[0.5],[0.9]])
targets = torch.tensor([3.0, 2.0], dtype=torch.float32)

network = net(1,5)
optimizer = torch.optim.Adam(network.parameters(), lr=0.001)

for i in range(10000):
    out = network(torch.tensor(inputs, dtype=torch.float32))
    loss = F.mse_loss(out, targets)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    print("Loss: %f outputs: %f, %f"%(loss.data.numpy(), out.data.numpy()[0], out.data.numpy()[1]))

但它仍然始终输出两个输入的输出平均值。事实证明,原因是我的输出和目标的尺寸不一样:目标是 Size[2],输出是 Size[2,1],并且出于某种原因 PyTorch 将输出广播为 Size [2,2] MSE 损失,这完全搞乱了一切。一旦我改变了:

targets = torch.tensor([3.0, 2.0], dtype=torch.float32)

它就

targets = torch.tensor([[3.0], [2.0]], dtype=torch.float32)

按预期工作了。这显然是用 PyTorch 完成的,但我怀疑其他库可能以同样的方式广播变量。

So I realise this is extremely late for the original post, but I came across this because I was having a similar problem and none of the reasons posted here cover what was wrong in my case.

I was working on a simple regression problem, but every time I trained the network it would converge to a point where it was giving me the same output (or sometimes a few different outputs) for each input. I played with the learning rate, the number of hidden layers/nodes, the optimization algorithm etc but it made no difference. Even when I looked at a ridiculously simple example, trying to predict the output (1d) of two different inputs (1d):

import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F

class net(nn.Module):
    def __init__(self, obs_size, hidden_size):
        super(net, self).__init__()
        self.fc = nn.Linear(obs_size, hidden_size)
        self.out = nn.Linear(hidden_size, 1)

    def forward(self, obs):
        h = F.relu(self.fc(obs))
        return self.out(h)

inputs = np.array([[0.5],[0.9]])
targets = torch.tensor([3.0, 2.0], dtype=torch.float32)

network = net(1,5)
optimizer = torch.optim.Adam(network.parameters(), lr=0.001)

for i in range(10000):
    out = network(torch.tensor(inputs, dtype=torch.float32))
    loss = F.mse_loss(out, targets)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    print("Loss: %f outputs: %f, %f"%(loss.data.numpy(), out.data.numpy()[0], out.data.numpy()[1]))

but STILL it was always outputting the average value of the outputs for both inputs. It turns out the reason is that the dimensions of my outputs and targets were not the same: the targets were Size[2], and the outputs were Size[2,1], and for some reason PyTorch was broadcasting the outputs to be Size[2,2] in the MSE loss, which completely messes everything up. Once I changed:

targets = torch.tensor([3.0, 2.0], dtype=torch.float32)

to

targets = torch.tensor([[3.0], [2.0]], dtype=torch.float32)

It worked as it should. This was obviously done with PyTorch, but I suspect maybe other libraries broadcast variables in the same way.

剩一世无双 2024-10-15 16:55:08

对我来说,事情的发生与您的情况完全一样,无论训练和训练如何,神经网络的输出始终相同。层数等。

事实证明我的反向传播算法有问题。在一处不需要的地方我乘以-1。

可能还有另一个类似的问题。问题是如何调试呢?

调试步骤:

Step1 : Write the algorithm such that it can take variable number of input layers and variable number of input & output nodes.
Step2 : Reduce the hidden layers to 0. Reduce input to 2 nodes, output to 1 node.
Step3 : Now train for binary-OR-Operation.
Step4 : If it converges correctly, go to Step 8.
Step5 : If it doesn't converge, train it only for 1 training sample
Step6 : Print all the forward and prognostication variables (weights, node-outputs, deltas etc)
Step7 : Take pen&paper and calculate all the variables manually.
Step8 : Cross verify the values with algorithm.
Step9 : If you don't find any problem with 0 hidden layers. Increase hidden layer size to 1. Repeat step 5,6,7,8

听起来工作量很大,但恕我直言,它效果很好。

For me it was happening exactly like in your case, the output of neural network was always the same no matter the training & number of layers etc.

Turns out my back-propagation algorithm had a problem. At one place I was multiplying by -1 where it wasn't required.

There could be another problem like this. The question is how to debug it?

Steps to debug:

Step1 : Write the algorithm such that it can take variable number of input layers and variable number of input & output nodes.
Step2 : Reduce the hidden layers to 0. Reduce input to 2 nodes, output to 1 node.
Step3 : Now train for binary-OR-Operation.
Step4 : If it converges correctly, go to Step 8.
Step5 : If it doesn't converge, train it only for 1 training sample
Step6 : Print all the forward and prognostication variables (weights, node-outputs, deltas etc)
Step7 : Take pen&paper and calculate all the variables manually.
Step8 : Cross verify the values with algorithm.
Step9 : If you don't find any problem with 0 hidden layers. Increase hidden layer size to 1. Repeat step 5,6,7,8

It sounds like a lot of work, but it works very well IMHO.

玩物 2024-10-15 16:55:08

当层数很大时,我的模型遇到了同样的问题。我使用的学习率为 0.0001。当我将学习率降低到 0.0000001 时,问题似乎解决了。我认为当学习率太低时算法会陷入局部最小值

I was running into the same problem with my model when number of layers is large. I was using a learning rate of 0.0001. When I lower the learning rate to 0.0000001 the problem seems solved. I think algorithms stuck on local minumums when learning rate is too low

鹿港巷口少年归 2024-10-15 16:55:08

我知道,对于原来的帖子来说,这已经太晚了,但也许我可以帮助别人解决这个问题,因为我面临着同样的问题。

对我来说,问题是,我的输入数据在重要列中缺少值,而训练/测试数据并未丢失。我将这些值替换为零值,瞧,结果突然变得合理了。所以也许检查你的数据,也许它被歪曲了

I know, that for the original post this is far, too late but maybe I can help someone with this, as I faced the same problem.

For me the problem was, that my input data had missing values in important columns, where the training/test data were not missing. I replaced these values with zero values and voilà, suddenly the results were plausible. So maybe check your data, maybe it si misrepresented

拔了角的鹿 2024-10-15 16:55:08

如果没有看到代码示例,很难说清楚,但对于网络来说,这是可能发生的,因为它的隐藏神经元数量。随着神经元数量和隐藏层数量的增加,不可能用少量训练数据来训练网络。在可以制作具有较小层和神经元的网络之前,使用较大的网络是错误的。因此,也许您的问题可以通过注意这一问题来解决。

It's hard to tell without seeing a code sample but it is possible occure for a net because its number of hidden neron.with incresing in number of neron and number of hiden layer it is not possible to train a net with small set of training data.until it is possible to make a net with smaller layer and nerons it is amiss to use a larger net.therefore perhaps your problem solved with attention to this matters.

瀞厅☆埖开 2024-10-15 16:55:08

我还没有用问题中的 XOR 问题对其进行测试,但是对于我基于 Tic-Tac-Toe 的原始数据集,我相信我已经让网络进行了一些训练(我只运行了 1000 epoch,这还不够) ):快速传播网络可以赢得/打平超过一半的比赛;反向传播可以得到大约41%。问题归结为实现错误(小错误)以及不理解误差导数(每个权重)和每个神经元的误差之间的差异,我没有在我的研究中得到重视。 @darkcanuck 关于训练偏差类似于权重的答案可能会有所帮助,尽管我没有实现它。我还用 Python 重写了我的代码,以便我可以更轻松地使用它。因此,虽然我还没有让网络达到极小极大算法的效率,但我相信我已经成功解决了问题。

I haven't tested it with the XOR problem in the question, but for my original dataset based on Tic-Tac-Toe, I believe that I have gotten the network to train somewhat (I only ran 1000 epochs, which wasn't enough): the quickpropagation network can win/tie over half of its games; backpropagation can get about 41%. The problems came down to implementation errors (small ones) and not understanding the difference between the error derivative (which is per-weight) and the error for each neuron, which I did not pick up on in my research. @darkcanuck's answer about training the bias similarly to a weight would probably have helped, though I didn't implement it. I also rewrote my code in Python so that I could more easily hack with it. Therefore, although I haven't gotten the network to match the minimax algorithm's efficiency, I believe that I have managed to solve the problem.

眼泪也成诗 2024-10-15 16:55:08

当我的数据没有正确标准化时,我之前遇到过类似的问题。一旦我标准化了数据,一切就正常运行了。

最近,我再次遇到这个问题,经过调试,我发现神经网络给出相同输出可能还有另一个原因。如果您的神经网络具有权重衰减项(例如 RSNNS 包中的权重衰减项),请确保您的衰减项不会太大,以致所有权重基本上都变为 0。

我使用的是 < R 中的strong>caret 包。最初,我使用衰减超参数 = 0.01。当我查看诊断时,我发现正在计算每次折叠(交叉验证)的 RMSE,但 Rsquared 始终为 NA。在这种情况下,所有预测都得出相同的值。

一旦我将衰减降低到更低的值(1E-5 及更低),我就得到了预期的结果。

我希望这有帮助。

I faced a similar issue earlier when my data was not properly normalized. Once I normalized the data everything ran correctly.

Recently, I faced this issue again and after debugging, I found that there can be another reason for neural networks giving the same output. If you have a neural network that has a weight decay term such as that in the RSNNS package, make sure that your decay term is not so large that all weights go to essentially 0.

I was using the caret package for in R. Initially, I was using a decay hyperparameter = 0.01. When I looked at the diagnostics, I saw that the RMSE was being calculated for each fold (of cross validation), but the Rsquared was always NA. In this case all predictions were coming out to the same value.

Once I reduced the decay to a much lower value (1E-5 and lower), I got the expected results.

I hope this helps.

九八野马 2024-10-15 16:55:08

如果没有看到代码示例,很难判断,但是偏差错误可能会产生这种影响(例如忘记将偏差添加到输入中),因此我会仔细查看代码的该部分。

It's hard to tell without seeing a code sample, but a bias bug can have that effect (e.g. forgetting to add the bias to the input), so I would take a closer look at that part of the code.

北斗星光 2024-10-15 16:55:08

根据您的评论,我同意@finnw 的观点,即您存在偏见问题。您应该将偏差视为每个神经元的恒定“1”(或 -1,如果您愿意)输入。每个神经元也将有自己的偏差权重,因此神经元的输出应该是加权输入的总和,加上通过激活函数传递的偏差乘以其权重。偏置权重在训练期间会像其他权重一样更新。

Fausett 的“神经网络基础知识”(第 300 页)有一个使用二进制输入的 XOR 示例以及一个具有 2 个输入、1 个由 4 个神经元组成的隐藏层和 1 个输出神经元的网络。权重在 +0.5 和 -0.5 之间随机初始化。学习率为 0.02 时,示例网络在大约 3000 个时期后收敛。如果解决了偏差问题(以及任何其他错误),您应该能够获得相同的结果。

另请注意,如果网络中没有隐藏层,则无法解决异或问题。

Based on your comments, I'd agree with @finnw that you have a bias problem. You should treat the bias as a constant "1" (or -1 if you prefer) input to each neuron. Each neuron will also have its own weight for the bias, so a neuron's output should be the sum of the weighted inputs, plus the bias times its weight, passed through the activation function. Bias weights are updated during training just like the other weights.

Fausett's "Fundamentals of Neural Networks" (p.300) has an XOR example using binary inputs and a network with 2 inputs, 1 hidden layer of 4 neurons and one output neuron. Weights are randomly initialized between +0.5 and -0.5. With a learning rate of 0.02 the example network converges after about 3000 epochs. You should be able to get a result in the same ballpark if you get the bias problems (and any other bugs) ironed out.

Also note that you cannot solve the XOR problem without a hidden layer in your network.

凯凯我们等你回来 2024-10-15 16:55:08

我遇到了类似的问题,我发现这是我的权重生成方式的问题。
我正在使用:

w = numpy.random.rand(layers[i], layers[i+1])

这生成了 0 到 1 之间的随机权重。
当我使用 randn() 代替时,问题得到了解决:

w = numpy.random.randn(layers[i], layers[i+1])

这会生成负权重,这有助于我的输出变得更加多样化。

I encountered a similar issue, I found out that it was a problem with how my weights were being generated.
I was using:

w = numpy.random.rand(layers[i], layers[i+1])

This generated a random weight between 0 and 1.
The problem was solved when I used randn() instead:

w = numpy.random.randn(layers[i], layers[i+1])

This generates negative weights, which helped my outputs become more varied.

那请放手 2024-10-15 16:55:08

我遇到了这个问题。我使用 nnet 预测 6 行数据和 1200 多列。

每列都会返回不同的预测,但该列中的所有行都将具有相同的值。

我通过显着增加大小参数来解决这个问题。我将其从 1-5 增加到 11+。

我还听说降低衰减率会有帮助。

I ran into this exact issue. I was predicting 6 rows of data with 1200+ columns using nnet.

Each column would return a different prediction but all of the rows in that column would be the same value.

I got around this by increasing the size parameter significantly. I increased it from 1-5 to 11+.

I have also heard that decreasing your decay rate can help.

漫漫岁月 2024-10-15 16:55:08

我在机器学习算法方面也遇到过类似的问题,当我查看代码时,我发现随机生成器并不是真正随机的。如果您不使用新的随机种子(例如 Unix 时间),请参阅 http://en.wikipedia。 org/wiki/Unix_time),那么就有可能一遍又一遍地得到完全相同的结果。

I've had similar problems with machine learning algorithms and when I looked at the code I found random generators that were not really random. If you do not use a new random seed (such Unix time for example, see http://en.wikipedia.org/wiki/Unix_time) then it is possible to get the exact same results over and over again.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文