为什么我们必须对人工神经网络的输入进行标准化?

发布于 2024-10-11 21:18:56 字数 1455 浏览 8 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

雨落星ぅ辰 2024-10-18 21:18:56

这里解释得很好

如果输入变量线性组合,如 MLP [多层感知器],那么它是
至少在理论上,很少严格需要标准化输入。这
原因是输入向量的任何重新缩放都可以通过以下方式有效撤销
改变相应的权重和偏差,给你留下准确的
与之前的输出相同。不过还有各种实用的
标准化输入可以使训练更快并减少训练时间的原因
陷入局部最优的可能性。此外,权重衰减和贝叶斯
使用标准化输入可以更方便地进行估计。

It's explained well here.

If the input variables are combined linearly, as in an MLP [multilayer perceptron], then it is
rarely strictly necessary to standardize the inputs, at least in theory. The
reason is that any rescaling of an input vector can be effectively undone by
changing the corresponding weights and biases, leaving you with the exact
same outputs as you had before. However, there are a variety of practical
reasons why standardizing the inputs can make training faster and reduce the
chances of getting stuck in local optima. Also, weight decay and Bayesian
estimation can be done more conveniently with standardized inputs.

亣腦蒛氧 2024-10-18 21:18:56

在神经网络中,不仅对数据进行归一化,而且对数据进行缩放也是个好主意。这是为了更快地接近误差表面的全局最小值。请看下面的图片:
归一化前后的误差面

缩放前后的误差表面

图片取自 coursera 课程 关于神经网络网络。 课程的作者是 Geoffrey Hinton。

In neural networks, it is good idea not just to normalize data but also to scale them. This is intended for faster approaching to global minima at error surface. See the following pictures:
error surface before and after normalization

error surface before and after scaling

Pictures are taken from the coursera course about neural networks. Author of the course is Geoffrey Hinton.

哥,最终变帅啦 2024-10-18 21:18:56

神经网络的某些输入可能没有“自然定义”的值范围。例如,平均值可能会缓慢但随着时间的推移不断增加(例如数据库中的许多记录)。

在这种情况下,将此原始值输入到您的网络中将无法很好地发挥作用。您将教您的网络使用范围较低部分的值,而实际输入将来自该范围的较高部分(很可能高于网络已经学会使用的范围)。

您应该标准化该值。例如,您可以通过自上次输入以来值发生了多少变化来告诉网络。该增量通常可以在特定范围内以高概率定义,这使其成为网络的良好输入。

Some inputs to NN might not have a 'naturally defined' range of values. For example, the average value might be slowly, but continuously increasing over time (for example a number of records in the database).

In such case feeding this raw value into your network will not work very well. You will teach your network on values from lower part of range, while the actual inputs will be from the higher part of this range (and quite possibly above range, that the network has learned to work with).

You should normalize this value. You could for example tell the network by how much the value has changed since the previous input. This increment usually can be defined with high probability in a specific range, which makes it a good input for network.

深海不蓝 2024-10-18 21:18:56

我们必须在将输入特征输入神经网络之前对其进行标准化有两个原因:

原因 1:如果数据集中中的特征很大与其他特征相比,这种大尺度特征就变得占主导地位,因此,神经网络的预测将不准确。

示例:对于员工数据,如果我们考虑年龄和薪水,年龄将是两位数,而薪水可以是 7 或 8 位数(100 万等)。在这种情况下,薪资将主导神经网络的预测。但如果我们对这些特征进行归一化,这两个特征的值将位于 (0 到 1) 的范围内。

原因 2:神经网络的前向传播涉及权重与输入特征的点积。因此,如果值非常高(对于图像和非图像数据),输出的计算需要大量的计算时间和内存。反向传播期间的情况也是如此。因此,如果输入未标准化,模型收敛速度会很慢。

示例:如果我们进行图像分类,图像的大小将非常巨大,因为每个像素的值范围从 0 到 255。这种情况下的归一化非常重要。

下面提到的是归一化非常重要的实例:

  1. K 均值
  2. K 最近邻
  3. 主成分分析 (PCA)
  4. 梯度下降

There are 2 Reasons why we have to Normalize Input Features before Feeding them to Neural Network:

Reason 1: If a Feature in the Dataset is big in scale compared to others then this big scaled feature becomes dominating and as a result of that, Predictions of the Neural Network will not be Accurate.

Example: In case of Employee Data, if we consider Age and Salary, Age will be a Two Digit Number while Salary can be 7 or 8 Digit (1 Million, etc..). In that Case, Salary will Dominate the Prediction of the Neural Network. But if we Normalize those Features, Values of both the Features will lie in the Range from (0 to 1).

Reason 2: Front Propagation of Neural Networks involves the Dot Product of Weights with Input Features. So, if the Values are very high (for Image and Non-Image Data), Calculation of Output takes a lot of Computation Time as well as Memory. Same is the case during Back Propagation. Consequently, Model Converges slowly, if the Inputs are not Normalized.

Example: If we perform Image Classification, Size of Image will be very huge, as the Value of each Pixel ranges from 0 to 255. Normalization in this case is very important.

Mentioned below are the instances where Normalization is very important:

  1. K-Means
  2. K-Nearest-Neighbours
  3. Principal Component Analysis (PCA)
  4. Gradient Descent
吹泡泡o 2024-10-18 21:18:56

当您使用非标准化输入特征时,损失函数可能会出现非常拉长的谷值。当使用梯度下降进行优化时,这成为一个问题,因为对于某些参数来说梯度会很陡。当您在陡峭的斜坡之间跳跃时,这会导致搜索空间出现较大的振荡。作为补偿,您必须以较小的学习率来稳定优化。

考虑特征 x1 和 x2,其范围分别为 0 到 1 和 0 到 100 万。事实证明,相应参数(例如 w1 和 w2)的比率也会很大。

归一化往往会使损失函数更加对称/球形。这些更容易优化,因为梯度往往指向全局最小值,并且您可以采取更大的步长。

When you use unnormalized input features, the loss function is likely to have very elongated valleys. When optimizing with gradient descent, this becomes an issue because the gradient will be steep with respect some of the parameters. That leads to large oscillations in the search space, as you are bouncing between steep slopes. To compensate, you have to stabilize optimization with small learning rates.

Consider features x1 and x2, where range from 0 to 1 and 0 to 1 million, respectively. It turns out the ratios for the corresponding parameters (say, w1 and w2) will also be large.

Normalizing tends to make the loss function more symmetrical/spherical. These are easier to optimize because the gradients tend to point towards the global minimum and you can take larger steps.

上课铃就是安魂曲 2024-10-18 21:18:56

从外部看神经网络,它只是一个接受一些参数并产生结果的函数。与所有函数一样,它有一个域(即一组合法参数)。您必须对要传递给神经网络的值进行标准化,以确保它位于域中。与所有函数一样,如果参数不在域中,则不能保证结果正确。

神经网络对域外参数的确切行为取决于神经网络的实现。但总的来说,如果参数不在域内,结果是无用的。

Looking at the neural network from the outside, it is just a function that takes some arguments and produces a result. As with all functions, it has a domain (i.e. a set of legal arguments). You have to normalize the values that you want to pass to the neural net in order to make sure it is in the domain. As with all functions, if the arguments are not in the domain, the result is not guaranteed to be appropriate.

The exact behavior of the neural net on arguments outside of the domain depends on the implementation of the neural net. But overall, the result is useless if the arguments are not within the domain.

等待我真够勒 2024-10-18 21:18:56

我相信答案取决于具体情况。

将 NN(神经网络)视为算子 F,因此F(输入) = 输出。在这种关系是线性的情况下,F(A * 输入) = A * 输出,那么您可以选择让输入/输出以其原始形式未归一化,或者对两者进行归一化以消除A. 显然,这种线性假设在分类任务或几乎任何输出概率的任务中都被违反,其中 F(A * 输入) = 1 * 输出

在实践中,归一化允许将不可拟合网络适合,这对于实验者/程序员来说至关重要。然而,归一化的精确影响不仅取决于网络架构/算法,还取决于输入和输出的统计先验。

更重要的是,神经网络经常被用来以黑盒的方式解决非常困难的问题,这意味着潜在问题可能有一个非常糟糕的统计公式,使得很难评估归一化的影响,从而导致技术优势(变得可适应)主导其对统计数据的影响。

从统计意义上来说,归一化会消除在预测输出时被认为是非因果的变异,从而防止 NN 学习这种变异作为预测变量(NN 没有看到这种变异,因此无法使用它)。

I believe the answer is dependent on the scenario.

Consider NN (neural network) as an operator F, so that F(input) = output. In the case where this relation is linear so that F(A * input) = A * output, then you might choose to either leave the input/output unnormalised in their raw forms, or normalise both to eliminate A. Obviously this linearity assumption is violated in classification tasks, or nearly any task that outputs a probability, where F(A * input) = 1 * output

In practice, normalisation allows non-fittable networks to be fittable, which is crucial to experimenters/programmers. Nevertheless, the precise impact of normalisation will depend not only on the network architecture/algorithm, but also on the statistical prior for the input and output.

What's more, NN is often implemented to solve very difficult problems in a black-box fashion, which means the underlying problem may have a very poor statistical formulation, making it hard to evaluate the impact of normalisation, causing the technical advantage (becoming fittable) to dominate over its impact on the statistics.

In statistical sense, normalisation removes variation that is believed to be non-causal in predicting the output, so as to prevent NN from learning this variation as a predictor (NN does not see this variation, hence cannot use it).

等待圉鍢 2024-10-18 21:18:56

需要归一化的原因是,如果您查看自适应步骤如何在函数域中的一个位置进行,并且您只需将问题转移到与在函数域中某个方向上的某个大值平移的相同步骤的等价物上。域,那么你会得到不同的结果。它归结为使线性片段适应数据点的问题。该棋子在不转动的情况下应该移动多少,以及根据该训练点应该转动多少?在域的不同部分改变适应过程是没有意义的!因此需要进行归一化来减少训练结果的差异。我还没有写下来,但你可以看看一个简单的线性函数的数学,以及它是如何通过两个不同地方的一个训练点进行训练的。这个问题可能在某些地方已经得到纠正,但我对它们不熟悉。在 ALN 中,问题已得到纠正,如果您写信给 wwarmstrong AT shaw.ca,我可以给您发送一篇论文

The reason normalization is needed is because if you look at how an adaptive step proceeds in one place in the domain of the function, and you just simply transport the problem to the equivalent of the same step translated by some large value in some direction in the domain, then you get different results. It boils down to the question of adapting a linear piece to a data point. How much should the piece move without turning and how much should it turn in response to that one training point? It makes no sense to have a changed adaptation procedure in different parts of the domain! So normalization is required to reduce the difference in the training result. I haven't got this written up, but you can just look at the math for a simple linear function and how it is trained by one training point in two different places. This problem may have been corrected in some places, but I am not familiar with them. In ALNs, the problem has been corrected and I can send you a paper if you write to wwarmstrong AT shaw.ca

み青杉依旧 2024-10-18 21:18:56

在较高的层面上,如果您观察归一化/标准化最常用的地方,您会注意到,只要在模型构建过程中使用量级差异,就有必要对输入进行标准化,以确保重要的输入即使幅度很小,也不要在模型构建过程中失去其重要性。

示例:

√(3-1)^2+(1000-900)^2 ≈ √(1000-900)^2

在这里,(3-1) 对结果几乎没有贡献,因此模型认为与这些值相对应的输入是无用的。

请考虑以下事项:

  1. 聚类使用欧几里德或其他距离度量。
  2. 神经网络使用优化算法来最小化成本函数(例如 MSE)。

距离测量(聚类)和成本函数(NN)都以某种方式使用幅度差异,因此标准化确保幅度差异不会控制重要的输入参数,并且算法按预期工作。

On a high level, if you observe as to where normalization/standardization is mostly used, you will notice that, anytime there is a use of magnitude difference in model building process, it becomes necessary to standardize the inputs so as to ensure that important inputs with small magnitude don't loose their significance midway the model building process.

example:

√(3-1)^2+(1000-900)^2 ≈ √(1000-900)^2

Here, (3-1) contributes hardly a thing to the result and hence the input corresponding to these values is considered futile by the model.

Consider the following:

  1. Clustering uses euclidean or, other distance measures.
  2. NNs use optimization algorithm to minimise cost function(ex. - MSE).

Both distance measure(Clustering) and cost function(NNs) use magnitude difference in some way and hence standardization ensures that magnitude difference doesn't command over important input parameters and the algorithm works as expected.

遗弃M 2024-10-18 21:18:56

根据数据的复杂性使用隐藏层。如果我们有线性可分离的输入数据,那么我们不需要使用隐藏层,例如或门,但如果我们有非线性可分离的数据,那么我们需要使用隐藏层,例如异或逻辑门。
任何层采用的节点数量取决于我们输出的交叉验证程度。

Hidden layers are used in accordance with the complexity of our data. If we have input data which is linearly separable then we need not to use hidden layer e.g. OR gate but if we have a non linearly seperable data then we need to use hidden layer for example ExOR logical gate.
Number of nodes taken at any layer depends upon the degree of cross validation of our output.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文