从向量中删除 NA 值
我有一个巨大的向量,其中有几个 NA 值,我试图找到该向量中的最大值(向量都是数字),但我不能这样做,因为NA
值。
如何删除 NA
值以便计算最大值?
I have a huge vector which has a couple of NA
values, and I'm trying to find the max value in that vector (the vector is all numbers), but I can't do this because of the NA
values.
How can I remove the NA
values so that I can compute the max?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
尝试
?max
,您会发现它实际上有一个na.rm =
参数,默认设置为FALSE
。 (这是许多其他 R 函数的常见默认值,包括sum()
、mean()
等)设置
na.rm=TRUE
正是您所要求的:如果您确实想删除所有
NA
,请改用以下习惯用法:最后一点:其他函数(例如
table()
、lm()
和sort()
) 具有使用不同名称(并提供不同选项)的NA
相关参数。因此,如果 NA 导致函数调用出现问题,则值得在函数参数中检查内置解决方案。我发现通常已经有一个了。Trying
?max
, you'll see that it actually has ana.rm =
argument, set by default toFALSE
. (That's the common default for many other R functions, includingsum()
,mean()
, etc.)Setting
na.rm=TRUE
does just what you're asking for:If you do want to remove all of the
NA
s, use this idiom instead:A final note: Other functions (e.g.
table()
,lm()
, andsort()
) haveNA
-related arguments that use different names (and offer different options). So ifNA
's cause you problems in a function call, it's worth checking for a built-in solution among the function's arguments. I've found there's usually one already there.na.omit 函数是许多回归例程内部使用的函数:
The
na.omit
function is what a lot of the regression routines use internally:使用purrr中的
丢弃
(适用于列表和向量)。好处是管道使用方便;或者使用内置子集函数
[
:请注意,
na.omit
不适用于列表:Use
discard
from purrr (works with lists and vectors).The benefit is that it is easy to use pipes; alternatively use the built-in subsetting function
[
:Note that
na.omit
does not work on lists:?max
显示有一个额外的参数na.rm
,您可以将其设置为TRUE
。除此之外,如果您确实想要删除
NA
,只需使用类似以下内容:?max
shows you that there is an extra parameterna.rm
that you can set toTRUE
.Apart from that, if you really want to remove the
NA
s, just use something like:以防 R 新手想要原始问题的简化答案
如下:
假设您有一个向量
foo
,如下所示:运行
length(foo)
得到 22。length(nona_foo)
为 21,因为NA 值已被删除。请记住
is.na(foo)
返回一个布尔矩阵,因此使用该值的相反值索引foo
将为您提供所有非 NA 的元素。Just in case someone new to R wants a simplified answer to the original question
Here it is:
Assume you have a vector
foo
as follows:running
length(foo)
gives 22.length(nona_foo)
is 21, because the NA values have been removed.Remember
is.na(foo)
returns a boolean matrix, so indexingfoo
with the opposite of this value will give you all the elements which are not NA.您可以调用
max(vector, na.rm = TRUE)
。更一般地,您可以使用na.omit()
函数。You can call
max(vector, na.rm = TRUE)
. More generally, you can use thena.omit()
function.我运行了一个快速基准测试,比较了两种
base
方法,结果发现x[!is.na(x)]
比na.omit
更快代码>.用户qwr
建议我也尝试purrr::dicard
- 结果发现速度要慢得多(尽管我很乐意对我的实现和测试发表评论!)供参考,这是
x[!is.na(x)]
与na.omit
的原始测试:I ran a quick benchmark comparing the two
base
approaches and it turns out thatx[!is.na(x)]
is faster thanna.omit
. Userqwr
suggested I trypurrr::dicard
also - this turned out to be massively slower (though I'll happily take comments on my implementation & test!)For reference, here's the original test of
x[!is.na(x)]
vsna.omit
:使用
complete.cases
的另一个选项如下:Created on 2022-08-26 with reprex v2.0.2
Another option using
complete.cases
like this:Created on 2022-08-26 with reprex v2.0.2