用于神经网络模型预测的数据的缺失值

发布于 2024-11-09 05:15:28 字数 225 浏览 5 评论 0原文

我目前有大量数据将用于训练预测神经网络(美国主要机场的千兆字节天气数据)。我几乎每天都有数据,但有些机场的数据中存在缺失值。例如,机场在 1995 年之前可能不存在,因此在此之前我没有该特定位置的数据。此外,有些还缺少整年(可能跨度为 1990 年至 2011 年,缺少 2003 年)。

我该如何使用这些缺失值进行训练而不误导我的神经网络?我虽然用 0 或 -1 填充空数据,但我觉得这会导致网络预测某些输出的这些值。

I currently have a lot of data that will be used to train a prediction neural network (gigabytes of weather data for major airports around the US). I have data for almost every day, but some airports have missing values in their data. For example, an airport might not have existed before 1995, so I have no data before then for that specific location. Also, some are missing whole years (one might span from 1990 to 2011, missing 2003).

What can I do to train with these missing values without misguiding my neural network? I though about filling the empty data with 0s or -1s, but I feel like this would cause the network to predict these values for some outputs.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

梦初启 2024-11-16 05:15:29

我使用了很多神经网络进行预测,我可以告诉你,你可以简单地在数据中留下这些“漏洞”。事实上,神经网络能够学习观察到的数据内部的关系,因此,如果您没有特定的时期,那也没关系……如果您将空数据设置为常量值,您将为训练算法提供误导信息。神经网络不需要“连续”数据,事实上,在训练之前对数据集进行洗牌是一种很好的做法,以便在不连续的样本上进行反向传播阶段......

I'm using a lot NNs for forecasting and I can say you that you can simply leave that "holes" in your data. In fact, NNs are able to learn relationships inside observed data and so if you don't have a specific period it doesn't matter...if you set empty data as a constant value you will have give to your training algorithm misleading information. NNs don't need "continuous" data, in fact it's a good practise to shuffle the data sets before training in order to do the backpropagation phase on not-contiguous samples...

风筝在阴天搁浅。 2024-11-16 05:15:29

一种名为 autoencoder 的神经网络适合您的工作。自动编码器可用于重建输入。自动编码器经过训练以学习底层数据流形/分布。然而,它们主要用于信号重建任务,例如图像和声音。但是,您可以使用它们来填补缺失的功能。

还有另一种技术称为“矩阵分解”,在许多推荐系统中使用。人们使用矩阵分解技术来用大量缺失值填充巨大的矩阵。例如,假设 IMDb 上有 100 万部电影。几乎没有人一生看过这些电影的 1/10。但她已经投票给了一些电影。该矩阵为 N x M,其中 N 是用户数量,M 是电影数量。矩阵分解是用于填充缺失值并根据用户之前对其他电影的投票向用户推荐电影的技术之一。

Well a type of neural network named autoencoder is suitable for your work. Autoencoders can be used to reconstruct the input. An autoencoder is trained to learn the underlying data manifold/distribution. However, they are mostly used for signal reconstruction tasks such as image and sound. You could however use them to fill the missing features.

There is also another technique coined as "matrix-factorization" which is used in many recommendation systems. People use matrix factorization techniques to fill huge matrices with a lot of missing values. For instance, suppose there are 1 million movies on IMDb. Almost no one has watched even 1/10 of those movies throughout her life. But she has voted for some movies. The matrix is N by M where N is the number of users and M the number of movies. Matrix factorization are among the techniques used to fill the missing values and suggest movies to the users based on their previous votes for other movies.

靑春怀旧 2024-11-16 05:15:28

我不是专家,但这肯定取决于您拥有的神经网络的类型?

神经网络的全部意义在于它们可以处理丢失的信息等等。

不过我同意,用 1 和 0 设置空数据并不是一件好事。

也许您可以提供一些有关您的神经网络的信息?

I'm not an expert, but surely this would depend on the type of neural network you have?

The whole point of neural networks is they can deal with missing information and so forth.

I agree though, setting empty data with 1's and 0's can't be a good thing.

Perhaps you could give some info on your neural network?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文