从向量中删除 NA 值

发布于 2024-12-08 17:07:28 字数 117 浏览 1 评论 0原文

我有一个巨大的向量,其中有几个 NA 值,我试图找到该向量中的最大值(向量都是数字),但我不能这样做,因为NA 值。

如何删除 NA 值以便计算最大值?

I have a huge vector which has a couple of NA values, and I'm trying to find the max value in that vector (the vector is all numbers), but I can't do this because of the NA values.

How can I remove the NA values so that I can compute the max?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

悲欢浪云 2024-12-15 17:07:28

尝试 ?max,您会发现它实际上有一个 na.rm = 参数,默认设置为 FALSE。 (这是许多其他 R 函数的常见默认值,包括 sum()mean() 等)

设置 na.rm=TRUE正是您所要求的:

d <- c(1, 100, NA, 10)
max(d, na.rm=TRUE)

如果您确实想删除所有 NA,请改用以下习惯用法:

d <- d[!is.na(d)]

最后一点:其他函数(例如 table()lm()sort()) 具有使用不同名称(并提供不同选项)的 NA 相关参数。因此,如果 NA 导致函数调用出现问题,则值得在函数参数中检查内置解决方案。我发现通常已经有一个了。

Trying ?max, you'll see that it actually has a na.rm = argument, set by default to FALSE. (That's the common default for many other R functions, including sum(), mean(), etc.)

Setting na.rm=TRUE does just what you're asking for:

d <- c(1, 100, NA, 10)
max(d, na.rm=TRUE)

If you do want to remove all of the NAs, use this idiom instead:

d <- d[!is.na(d)]

A final note: Other functions (e.g. table(), lm(), and sort()) have NA-related arguments that use different names (and offer different options). So if NA's cause you problems in a function call, it's worth checking for a built-in solution among the function's arguments. I've found there's usually one already there.

萌辣 2024-12-15 17:07:28

na.omit 函数是许多回归例程内部使用的函数:

vec <- 1:1000
vec[runif(200, 1, 1000)] <- NA
max(vec)
#[1] NA
max( na.omit(vec) )
#[1] 1000

The na.omit function is what a lot of the regression routines use internally:

vec <- 1:1000
vec[runif(200, 1, 1000)] <- NA
max(vec)
#[1] NA
max( na.omit(vec) )
#[1] 1000
凤舞天涯 2024-12-15 17:07:28

使用purrr中的丢弃 (适用于列表和向量)。

discard(v, is.na) 

好处是管道使用方便;或者使用内置子集函数[

v %>% discard(is.na)
v %>% .[!is.na(.)]

请注意,na.omit 不适用于列表:

> x <- list(a=1, b=2, c=NA)
> na.omit(x)
$a
[1] 1

$b
[1] 2

$c
[1] NA

Use discard from purrr (works with lists and vectors).

discard(v, is.na) 

The benefit is that it is easy to use pipes; alternatively use the built-in subsetting function [:

v %>% discard(is.na)
v %>% .[!is.na(.)]

Note that na.omit does not work on lists:

> x <- list(a=1, b=2, c=NA)
> na.omit(x)
$a
[1] 1

$b
[1] 2

$c
[1] NA
谁许谁一生繁华 2024-12-15 17:07:28

?max 显示有一个额外的参数 na.rm,您可以将其设置为 TRUE

除此之外,如果您确实想要删除NA,只需使用类似以下内容:

myvec[!is.na(myvec)]

?max shows you that there is an extra parameter na.rm that you can set to TRUE.

Apart from that, if you really want to remove the NAs, just use something like:

myvec[!is.na(myvec)]
霊感 2024-12-15 17:07:28

以防 R 新手想要原始问题的简化答案

如何从向量中删除 NA 值?

如下:

假设您有一个向量 foo,如下所示:

foo = c(1:10, NA, 20:30)

运行 length(foo) 得到 22。

nona_foo = foo[!is.na(foo)]

length(nona_foo) 为 21,因为NA 值已被删除。

请记住 is.na(foo) 返回一个布尔矩阵,因此使用该值的相反值索引 foo 将为您提供所有非 NA 的元素。

Just in case someone new to R wants a simplified answer to the original question

How can I remove NA values from a vector?

Here it is:

Assume you have a vector foo as follows:

foo = c(1:10, NA, 20:30)

running length(foo) gives 22.

nona_foo = foo[!is.na(foo)]

length(nona_foo) is 21, because the NA values have been removed.

Remember is.na(foo) returns a boolean matrix, so indexing foo with the opposite of this value will give you all the elements which are not NA.

冷了相思 2024-12-15 17:07:28

您可以调用max(vector, na.rm = TRUE)。更一般地,您可以使用 na.omit() 函数。

You can call max(vector, na.rm = TRUE). More generally, you can use the na.omit() function.

Smile简单爱 2024-12-15 17:07:28

我运行了一个快速基准测试,比较了两种 base 方法,结果发现 x[!is.na(x)]na.omit 更快代码>.用户 qwr 建议我也尝试 purrr::dicard - 结果发现速度要慢得多(尽管我很乐意对我的实现和测试发表评论!)

microbenchmark::microbenchmark(
  purrr::map(airquality,function(x) {x[!is.na(x)]}), 
  purrr::map(airquality,na.omit),
  purrr::map(airquality, ~purrr::discard(.x, .p = is.na)),
  times = 1e6)

Unit: microseconds
                                                     expr    min     lq      mean median      uq       max neval cld
 purrr::map(airquality, function(x) {     x[!is.na(x)] })   66.8   75.9  130.5643   86.2  131.80  541125.5 1e+06 a  
                          purrr::map(airquality, na.omit)   95.7  107.4  185.5108  129.3  190.50  534795.5 1e+06  b 
  purrr::map(airquality, ~purrr::discard(.x, .p = is.na)) 3391.7 3648.6 5615.8965 4079.7 6486.45 1121975.4 1e+06   c

供参考,这是 x[!is.na(x)]na.omit 的原始测试:

microbenchmark::microbenchmark(
    purrr::map(airquality,function(x) {x[!is.na(x)]}), 
    purrr::map(airquality,na.omit), 
    times = 1000000)


Unit: microseconds
                                              expr  min   lq      mean median    uq      max neval cld
 map(airquality, function(x) {     x[!is.na(x)] }) 53.0 56.6  86.48231   58.1  64.8 414195.2 1e+06  a 
                          map(airquality, na.omit) 85.3 90.4 134.49964   92.5 104.9 348352.8 1e+06   b

I ran a quick benchmark comparing the two base approaches and it turns out that x[!is.na(x)] is faster than na.omit. User qwr suggested I try purrr::dicard also - this turned out to be massively slower (though I'll happily take comments on my implementation & test!)

microbenchmark::microbenchmark(
  purrr::map(airquality,function(x) {x[!is.na(x)]}), 
  purrr::map(airquality,na.omit),
  purrr::map(airquality, ~purrr::discard(.x, .p = is.na)),
  times = 1e6)

Unit: microseconds
                                                     expr    min     lq      mean median      uq       max neval cld
 purrr::map(airquality, function(x) {     x[!is.na(x)] })   66.8   75.9  130.5643   86.2  131.80  541125.5 1e+06 a  
                          purrr::map(airquality, na.omit)   95.7  107.4  185.5108  129.3  190.50  534795.5 1e+06  b 
  purrr::map(airquality, ~purrr::discard(.x, .p = is.na)) 3391.7 3648.6 5615.8965 4079.7 6486.45 1121975.4 1e+06   c

For reference, here's the original test of x[!is.na(x)] vs na.omit:

microbenchmark::microbenchmark(
    purrr::map(airquality,function(x) {x[!is.na(x)]}), 
    purrr::map(airquality,na.omit), 
    times = 1000000)


Unit: microseconds
                                              expr  min   lq      mean median    uq      max neval cld
 map(airquality, function(x) {     x[!is.na(x)] }) 53.0 56.6  86.48231   58.1  64.8 414195.2 1e+06  a 
                          map(airquality, na.omit) 85.3 90.4 134.49964   92.5 104.9 348352.8 1e+06   b
生生漫 2024-12-15 17:07:28

使用 complete.cases 的另一个选项如下:

d <- c(1, 100, NA, 10)
result <- complete.cases(d)
output <- d[result]
output
#> [1]   1 100  10
max(output)
#> [1] 100

Created on 2022-08-26 with reprex v2.0.2

Another option using complete.cases like this:

d <- c(1, 100, NA, 10)
result <- complete.cases(d)
output <- d[result]
output
#> [1]   1 100  10
max(output)
#> [1] 100

Created on 2022-08-26 with reprex v2.0.2

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文