Python数据科学归根结构。具有两个或多个NAN值的索引
我想在数据上使用线性回归模型。但是,有些列有NAN值,我不知道该如何处理。
索引 | F_1 | F_2 | F_3 |
---|---|---|---|
0 | 0.5 | 1.5-1 | 1 |
0 | 0.8 | 2.3 | 2 |
2 | Nan Nan | Nan | 3 |
3 | 1.2 | 3.0 | 4 Nan |
4 | Nan 4 | Nan 1.9 | 1.4 |
5 | 0.7 | Nan 1.6 Nan | 1.6 6 1 2.6 |
2 2.6 | 2.6 2.6 | 2.6 2.6 | 2.2 |
以适合数据,我可以用Nan值删除列:
index | f_1 f_1 | f_1 f_2 f_2 f_2 | F_3 |
---|---|---|---|
0 | 0.5 | 1.5-1 | 1 |
但 | 0.8 | 2.3 | 2 |
6 | 1 | 2.6 | 2.2, |
我想保留所有数据并仍然能够处理。如何处理NAN值的列?
I want to use a linear regression model on my data. However, some columns have NaN values, and I dont know how to go about it.
Index | F_1 | F_2 | F_3 |
---|---|---|---|
0 | 0.5 | 1.5 - | 1 |
1 | 0.8 | 2.3 | 2 |
2 | NaN | NaN | 3 |
3 | 1.2 | 3.0 | NaN |
4 | NaN | 1.9 | 1.4 |
5 | 0.7 | NaN | 1.6 |
6 | 1 | 2.6 | 2.2 |
To fit the data, I could delete the columns with NaN values:
Index | F_1 | F_2 | F_3 |
---|---|---|---|
0 | 0.5 | 1.5 - | 1 |
1 | 0.8 | 2.3 | 2 |
6 | 1 | 2.6 | 2.2 |
But I want to keep all my data and still be able to process it. How do I handle the columns with NaN values?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
假设您正在使用熊猫,则解决方案是仅使用有效值的行。这意味着您必须省略所有具有NAN值的行。
另一种选择是用某种合理的值替换NAN,例如列的平均值。如果您有很多列,并且只有一些缺少值,这可能会有所帮助。
Assuming that you are using Pandas, the solution would be to use only row with valid values. That means you have to omit all rows with NaN values.
An alternative would be to replace NaN with some reasonable value, e.g. the average of the column. This can be helpful if you have many columns and only some have missing values.