每个 NA 值仅显示一行
在我的脚本中的某个时刻,我喜欢查看缺失值
的数量 在我的 data.frame
中并显示它们。 就我而言:
out <- read.csv(file="...../OUT.csv", na.strings="NULL")
sum(is.na(out$codeHelper))
out[is.na(out$codeHelper),c(1,length(colnames(out)))]
它工作得很好。 然而,最后一个命令显然给了我整个data.frame
,其中NA
是TRUE
,例如:
5561 Yemen (PDR) <NA>
5562 Yemen (PDR) <NA>
5563 Yemen (PDR) <NA>
5564 Yemen (PDR) <NA>
5565 Yemen (PDR) <NA>
5566 Yemen (PDR) <NA>
5567 Yemen (PDR) <NA>
5568 Yemen (PDR) <NA>
5601 Zaire (Democ Republic Congo) <NA>
5602 Zaire (Democ Republic Congo) <NA>
5603 Zaire (Democ Republic Congo) <NA>
5604 Zaire (Democ Republic Congo) <NA>
5605 Zaire (Democ Republic Congo) <NA>
有一个大框架和很多NA 看起来相当混乱。 对我来说重要的只是 NA 发生的地方,即哪个国家/地区 (在第二列中)在第三列中缺少值。
那么我怎样才能只显示每个国家的一行呢?
它应该看起来像这样:
1 Yemen (PDR) <NA>
2 Zaire (Democ Republic Congo) <NA>
3 USA <NA>
4 W. Samoa <NA>
At some point in my script I like to see the number of missing values
in my data.frame
and display them.
In my case I have:
out <- read.csv(file="...../OUT.csv", na.strings="NULL")
sum(is.na(out$codeHelper))
out[is.na(out$codeHelper),c(1,length(colnames(out)))]
It works perfectly fine.
However, the last command obviously gives me the whole data.frame
where the NA
is TRUE
, eg:
5561 Yemen (PDR) <NA>
5562 Yemen (PDR) <NA>
5563 Yemen (PDR) <NA>
5564 Yemen (PDR) <NA>
5565 Yemen (PDR) <NA>
5566 Yemen (PDR) <NA>
5567 Yemen (PDR) <NA>
5568 Yemen (PDR) <NA>
5601 Zaire (Democ Republic Congo) <NA>
5602 Zaire (Democ Republic Congo) <NA>
5603 Zaire (Democ Republic Congo) <NA>
5604 Zaire (Democ Republic Congo) <NA>
5605 Zaire (Democ Republic Congo) <NA>
With a big frame and a lot of NAs that looks pretty messy.
Important to me is only where the NA occurs i.e which country
(in the second column) has a missing value in the third column.
So how can i only display a single row for each country?
It should look something like this:
1 Yemen (PDR) <NA>
2 Zaire (Democ Republic Congo) <NA>
3 USA <NA>
4 W. Samoa <NA>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
unique(c(1,2,3,4,4))
会给你
所以
unique(out[is.na(out$codeHelper),c(1,length(colnames(out)))])
应该是您要找的?
unique(c(1,2,3,4,4))
will give you
so
unique(out[is.na(out$codeHelper),c(1,length(colnames(out)))])
should be what you're looking for?
尝试这样的操作:
另请参阅此相关问题:如何删除数据框中的部分重复项?
Try something like this:
see also this related question: how to remove partial duplicates from a data frame?