当前位置：文江博客话题详情

从向量中删除 NA 值

发布于 2024-12-08 17:07:28 字数 117 浏览 5 评论 0原文

我有一个巨大的向量，其中有几个 NA 值，我试图找到该向量中的最大值（向量都是数字），但我不能这样做，因为NA 值。

如何删除 NA 值以便计算最大值？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

悲欢浪云 2024-12-15 17:07:28

尝试 ?max，您会发现它实际上有一个 na.rm = 参数，默认设置为 FALSE。（这是许多其他 R 函数的常见默认值，包括 sum()、mean() 等）

设置 na.rm=TRUE正是您所要求的：

d <- c(1, 100, NA, 10)
max(d, na.rm=TRUE)

如果您确实想删除所有 NA，请改用以下习惯用法：

d <- d[!is.na(d)]

最后一点：其他函数（例如 table()、lm() 和sort()) 具有使用不同名称（并提供不同选项）的 NA 相关参数。因此，如果 NA 导致函数调用出现问题，则值得在函数参数中检查内置解决方案。我发现通常已经有一个了。

Trying ?max, you'll see that it actually has a na.rm = argument, set by default to FALSE. (That's the common default for many other R functions, including sum(), mean(), etc.)

Setting na.rm=TRUE does just what you're asking for:

d <- c(1, 100, NA, 10)
max(d, na.rm=TRUE)

If you do want to remove all of the NAs, use this idiom instead:

d <- d[!is.na(d)]

A final note: Other functions (e.g. table(), lm(), and sort()) have NA-related arguments that use different names (and offer different options). So if NA's cause you problems in a function call, it's worth checking for a built-in solution among the function's arguments. I've found there's usually one already there.

回复收藏 0 原文

萌辣 2024-12-15 17:07:28

na.omit 函数是许多回归例程内部使用的函数：

vec <- 1:1000
vec[runif(200, 1, 1000)] <- NA
max(vec)
#[1] NA
max( na.omit(vec) )
#[1] 1000

The na.omit function is what a lot of the regression routines use internally:

vec <- 1:1000
vec[runif(200, 1, 1000)] <- NA
max(vec)
#[1] NA
max( na.omit(vec) )
#[1] 1000

回复收藏 0 原文

凤舞天涯 2024-12-15 17:07:28

使用purrr中的丢弃 （适用于列表和向量）。

discard(v, is.na)

好处是管道使用方便；或者使用内置子集函数[：

v %>% discard(is.na)
v %>% .[!is.na(.)]

请注意，na.omit 不适用于列表：

> x <- list(a=1, b=2, c=NA)
> na.omit(x)
$a
[1] 1

$b
[1] 2

$c
[1] NA

Use discard from purrr (works with lists and vectors).

discard(v, is.na)

The benefit is that it is easy to use pipes; alternatively use the built-in subsetting function [:

v %>% discard(is.na)
v %>% .[!is.na(.)]

Note that na.omit does not work on lists:

> x <- list(a=1, b=2, c=NA)
> na.omit(x)
$a
[1] 1

$b
[1] 2

$c
[1] NA

回复收藏 0 原文

谁许谁一生繁华 2024-12-15 17:07:28

?max 显示有一个额外的参数 na.rm，您可以将其设置为 TRUE。

除此之外，如果您确实想要删除NA，只需使用类似以下内容：

myvec[!is.na(myvec)]

?max shows you that there is an extra parameter na.rm that you can set to TRUE.

Apart from that, if you really want to remove the NAs, just use something like:

myvec[!is.na(myvec)]

回复收藏 0 原文

霊感 2024-12-15 17:07:28

以防 R 新手想要原始问题的简化答案

如何从向量中删除 NA 值？

如下：

假设您有一个向量 foo，如下所示：

foo = c(1:10, NA, 20:30)

运行 length(foo) 得到 22。

nona_foo = foo[!is.na(foo)]

length(nona_foo) 为 21，因为NA 值已被删除。

请记住 is.na(foo) 返回一个布尔矩阵，因此使用该值的相反值索引 foo 将为您提供所有非 NA 的元素。

Just in case someone new to R wants a simplified answer to the original question

How can I remove NA values from a vector?

Here it is:

Assume you have a vector foo as follows:

foo = c(1:10, NA, 20:30)

running length(foo) gives 22.

nona_foo = foo[!is.na(foo)]

length(nona_foo) is 21, because the NA values have been removed.

Remember is.na(foo) returns a boolean matrix, so indexing foo with the opposite of this value will give you all the elements which are not NA.

回复收藏 0 原文

冷了相思 2024-12-15 17:07:28

您可以调用max(vector, na.rm = TRUE)。更一般地，您可以使用 na.omit() 函数。

回复收藏 0 原文

Smile简单爱 2024-12-15 17:07:28

我运行了一个快速基准测试，比较了两种 base 方法，结果发现 x[!is.na(x)] 比 na.omit 更快代码>.用户 qwr 建议我也尝试 purrr::dicard - 结果发现速度要慢得多（尽管我很乐意对我的实现和测试发表评论！）

microbenchmark::microbenchmark(
  purrr::map(airquality,function(x) {x[!is.na(x)]}), 
  purrr::map(airquality,na.omit),
  purrr::map(airquality, ~purrr::discard(.x, .p = is.na)),
  times = 1e6)

Unit: microseconds
                                                     expr    min     lq      mean median      uq       max neval cld
 purrr::map(airquality, function(x) {     x[!is.na(x)] })   66.8   75.9  130.5643   86.2  131.80  541125.5 1e+06 a  
                          purrr::map(airquality, na.omit)   95.7  107.4  185.5108  129.3  190.50  534795.5 1e+06  b 
  purrr::map(airquality, ~purrr::discard(.x, .p = is.na)) 3391.7 3648.6 5615.8965 4079.7 6486.45 1121975.4 1e+06   c

供参考，这是 x[!is.na(x)] 与 na.omit 的原始测试：

microbenchmark::microbenchmark(
    purrr::map(airquality,function(x) {x[!is.na(x)]}), 
    purrr::map(airquality,na.omit), 
    times = 1000000)


Unit: microseconds
                                              expr  min   lq      mean median    uq      max neval cld
 map(airquality, function(x) {     x[!is.na(x)] }) 53.0 56.6  86.48231   58.1  64.8 414195.2 1e+06  a 
                          map(airquality, na.omit) 85.3 90.4 134.49964   92.5 104.9 348352.8 1e+06   b

I ran a quick benchmark comparing the two base approaches and it turns out that x[!is.na(x)] is faster than na.omit. User qwr suggested I try purrr::dicard also - this turned out to be massively slower (though I'll happily take comments on my implementation & test!)

microbenchmark::microbenchmark(
  purrr::map(airquality,function(x) {x[!is.na(x)]}), 
  purrr::map(airquality,na.omit),
  purrr::map(airquality, ~purrr::discard(.x, .p = is.na)),
  times = 1e6)

Unit: microseconds
                                                     expr    min     lq      mean median      uq       max neval cld
 purrr::map(airquality, function(x) {     x[!is.na(x)] })   66.8   75.9  130.5643   86.2  131.80  541125.5 1e+06 a  
                          purrr::map(airquality, na.omit)   95.7  107.4  185.5108  129.3  190.50  534795.5 1e+06  b 
  purrr::map(airquality, ~purrr::discard(.x, .p = is.na)) 3391.7 3648.6 5615.8965 4079.7 6486.45 1121975.4 1e+06   c

For reference, here's the original test of x[!is.na(x)] vs na.omit:

microbenchmark::microbenchmark(
    purrr::map(airquality,function(x) {x[!is.na(x)]}), 
    purrr::map(airquality,na.omit), 
    times = 1000000)


Unit: microseconds
                                              expr  min   lq      mean median    uq      max neval cld
 map(airquality, function(x) {     x[!is.na(x)] }) 53.0 56.6  86.48231   58.1  64.8 414195.2 1e+06  a 
                          map(airquality, na.omit) 85.3 90.4 134.49964   92.5 104.9 348352.8 1e+06   b

回复收藏 0 原文

生生漫 2024-12-15 17:07:28

使用 complete.cases 的另一个选项如下：

d <- c(1, 100, NA, 10)
result <- complete.cases(d)
output <- d[result]
output
#> [1]   1 100  10
max(output)
#> [1] 100

^{Created on 2022-08-26 with reprex v2.0.2}

Another option using complete.cases like this:

d <- c(1, 100, NA, 10)
result <- complete.cases(d)
output <- d[result]
output
#> [1]   1 100  10
max(output)
#> [1] 100

^{Created on 2022-08-26 with reprex v2.0.2}

回复收藏 0 原文

~没有更多了~

关于作者

亣腦蒛氧

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

从向量中删除 NA 值

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（8）

关于作者

相关话题

热门标签

推荐作者

紫罗兰の梦幻

-2134

liuxuanli

意中人

○愚か者の日

xxhui

友情链接

从向量中删除 NA 值

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（8）

关于作者

相关话题

热门标签

推荐作者

紫罗兰の梦幻

-2134

liuxuanli

意中人

○愚か者の日

xxhui

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。