深度学习数据标准化
我正在为我的模型使用不同类型的财务数据输入,我想了解有关它们标准化的更多信息。
特别是,在处理一些技术指标时,我将它们归一化为 0 到 1 之间的范围。
其他指标则归一化为 -1 到 1 之间的范围。
您对混合归一化数据有什么经验?
拥有这两个范围是否可以接受,或者训练数据集具有单个范围(即 [0 1])总是更好?
I’m working with different types of financial data inputs for my models and I would like to know more about normalization of them.
In particular, working with some technical indicators, I’ve normalized them to have a range between 0 and 1.
Others were normalized to have a range between -1 and 1.
What is your experience with mixed normalized data?
Could it be acceptable to have these two ranges or is it always better to have the training dataset with a single range i.e. [0 1]?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
需要注意的是,当我们讨论数据标准化时,我们通常指的是连续数据的标准化。分类数据(通常)不需要前者。
此外,并非所有机器学习方法都需要标准化数据才能正常运行。此类方法的示例包括随机森林和梯度增强机。然而,其他人却这样做。例如,支持向量机和神经网络。
输入数据标准化的原因取决于方法本身。对于 SVM,数据标准化是为了确保输入特征在影响模型决策方面具有同等的重要性。对于神经网络,我们对数据进行归一化,以使梯度下降过程顺利收敛。
最后,为了回答您的问题,如果您正在处理连续数据并使用神经网络对数据进行建模,只需确保标准化数据的值彼此接近(即使它们不在同一范围内),因为这是什么决定了梯度下降过程收敛的难易程度。如果您使用 SVM,最好将数据标准化为单个范围,以便 SVM 使用的相似性/距离函数可以赋予所有特征同等的重要性。在其他情况下,无论范围如何,数据标准化的需要都可以完全消除。最终,这取决于您使用的建模技术!
感谢@user3666197 在评论中提供的有用反馈。
It is important to note that when we discuss data normalization, we are usually referring to the normalization of continuous data. Categorical data (usually) doesn't require the former.
Furthermore, not all ML methods need you to normalize data for them to function well. Examples of such methods include Random Forests and Gradient Boosting Machines. Others, however, do. For instance, Support Vector Machines and Neural Networks.
The reasons for input data normalization are dependent on the methods themselves. For SVMs, data normalization is done to ensure that input features are given equal importance in influencing the model's decisions. For neural networks, we normalize data to allow the gradient descent process to converge smoothly.
Finally, to answer your question, if you are working with continuous data and using a neural network to model your data, just make sure that the normalized data's values are close to each other (even if they are not the same range) because that is what determines the ease with which the gradient descent process converges. If you are working with an SVM, it would be better to normalize your data to a single range, so that all features may be given equal importance by the similarity/ distance function that your SVM uses. In other cases, the need for data normalization, whatever the ranges, may be removed entirely. Ultimately, it depends on the modeling technique you are using!
Credit to @user3666197 for the helpful feedback in the comments.