将 R 中选定列中的所有 NA 替换为 FALSE

发布于 2024-12-02 08:37:37 字数 528 浏览 3 评论 0原文

我有一个类似于这个的问题,但我的数据集是更大一些:50 列,其中 1 列作为 UID,其他列带有 TRUENA,我想将所有 NA 更改为 ,但是我不想使用显式循环。

plyr 能做到这一点吗?谢谢。

更新 #1

感谢您的快速回复,但是如果我的数据集如下所示:

df <- data.frame(
  id = c(rep(1:19),NA),
  x1 = sample(c(NA,TRUE), 20, replace = TRUE),
  x2 = sample(c(NA,TRUE), 20, replace = TRUE)
)

我只想处理 X1X2 ,该怎么办?

I have a question similar to this one, but my dataset is a bit bigger: 50 columns with 1 column as UID and other columns carrying either TRUE or NA, I want to change all the NA to FALSE, but I don't want to use explicit loop.

Can plyr do the trick? Thanks.

UPDATE #1

Thanks for quick reply, but what if my dataset is like below:

df <- data.frame(
  id = c(rep(1:19),NA),
  x1 = sample(c(NA,TRUE), 20, replace = TRUE),
  x2 = sample(c(NA,TRUE), 20, replace = TRUE)
)

I only want X1 and X2 to be processed, how can this be done?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

与之呼应 2024-12-09 08:37:37

如果您想替换变量的子集,您仍然可以使用 is.na(*) <- 技巧,如下所示:

df[c("x1", "x2")][is.na(df[c("x1", "x2")])] <- FALSE

IMO 使用临时变量使逻辑更容易理解:

vars.to.replace <- c("x1", "x2")
df2 <- df[vars.to.replace]
df2[is.na(df2)] <- FALSE
df[vars.to.replace] <- df2

If you want to do the replacement for a subset of variables, you can still use the is.na(*) <- trick, as follows:

df[c("x1", "x2")][is.na(df[c("x1", "x2")])] <- FALSE

IMO using temporary variables makes the logic easier to follow:

vars.to.replace <- c("x1", "x2")
df2 <- df[vars.to.replace]
df2[is.na(df2)] <- FALSE
df[vars.to.replace] <- df2
半步萧音过轻尘 2024-12-09 08:37:37

tidyr::replace_na 出色的功能。

df %>%
  replace_na(list(x1 = FALSE, x2 = FALSE))

这是一个非常棒的快速修复方法。唯一的技巧是列出要更改的列。

tidyr::replace_na excellent function.

df %>%
  replace_na(list(x1 = FALSE, x2 = FALSE))

This is such a great quick fix. the only trick is you make a list of the columns you want to change.

单挑你×的.吻 2024-12-09 08:37:37

尝试使用此代码:

df <- data.frame(
  id = c(rep(1:19), NA),
  x1 = sample(c(NA, TRUE), 20, replace = TRUE),
  x2 = sample(c(NA, TRUE), 20, replace = TRUE)
)
replace(df, is.na(df), FALSE)

更新以获得另一个解决方案。

df2 <- df <- data.frame(
  id = c(rep(1:19), NA),
  x1 = sample(c(NA, TRUE), 20, replace = TRUE),
  x2 = sample(c(NA, TRUE), 20, replace = TRUE)
)
df2[names(df) == "id"] <- FALSE
df2[names(df) != "id"] <- TRUE
replace(df, is.na(df) & df2, FALSE)

Try this code:

df <- data.frame(
  id = c(rep(1:19), NA),
  x1 = sample(c(NA, TRUE), 20, replace = TRUE),
  x2 = sample(c(NA, TRUE), 20, replace = TRUE)
)
replace(df, is.na(df), FALSE)

UPDATED for an another solution.

df2 <- df <- data.frame(
  id = c(rep(1:19), NA),
  x1 = sample(c(NA, TRUE), 20, replace = TRUE),
  x2 = sample(c(NA, TRUE), 20, replace = TRUE)
)
df2[names(df) == "id"] <- FALSE
df2[names(df) != "id"] <- TRUE
replace(df, is.na(df) & df2, FALSE)
尐籹人 2024-12-09 08:37:37

您可以使用gdata包中的NAToUnknown函数

df[,c('x1', 'x2')] = gdata::NAToUnknown(df[,c('x1', 'x2')], unknown = 'FALSE')

You can use the NAToUnknown function in the gdata package

df[,c('x1', 'x2')] = gdata::NAToUnknown(df[,c('x1', 'x2')], unknown = 'FALSE')
拥有 2024-12-09 08:37:37

使用dplyr,您还可以这样做,

df %>% mutate_each(funs(replace(., is.na(.), F)), x1, x2)

使用replace()相比,它的可读性稍差,但更通用,因为它允许选择列被改造。如果您想在某些列中保留 NA,但想删除其他列中的 NA,则此解决方案尤其适用。

With dplyr you could also do

df %>% mutate_each(funs(replace(., is.na(.), F)), x1, x2)

It is a bit less readable compared to just using replace() but more generic as it allows to select the columns to be transformed. This solution especially applies if you want to keep NAs in some columns but want to get rid of NAs in others.

書生途 2024-12-09 08:37:37

一种选择是使用 for 循环。

for(i in c("x1", "x2")) df[[i]][is.na(df[[i]])] <- FALSE

基准

set.seed(42)
df <- data.frame(
  id = c(rep(1:19),NA),
  x1 = sample(c(NA,TRUE), 20, replace = TRUE),
  x2 = sample(c(NA,TRUE), 20, replace = TRUE)
)

bench::mark(check=FALSE,
"Holger Brandl" = local(dplyr::mutate_each(df, dplyr::funs(replace(., is.na(.), F)), x1, x2)),
"mtelesha" = local(df <- tidyr::replace_na(df, list(x1 = FALSE, x2 = FALSE))),
Ramnath = local(df[,c('x1', 'x2')] <- gdata::NAToUnknown(df[,c('x1', 'x2')], unknown = 'FALSE')),
"Hong Ooi" = local(df[c("x1", "x2")][is.na(df[c("x1", "x2")])] <- FALSE),
GKi = local(for(i in c("x1", "x2")) df[[i]][is.na(df[[i]])] <- FALSE) )
#  expression         min   median `itr/sec` mem_al…¹ gc/se…² n_itr  n_gc total…³
#  <bch:expr>    <bch:tm> <bch:tm>     <dbl> <bch:by>   <dbl> <int> <dbl> <bch:t>
#1 Holger Brandl  16.93ms  17.33ms      57.6  34.43KB    19.2    21     7   365ms
#2 mtelesha        3.94ms   4.39ms     226.    8.15KB    13.1   103     6   456ms
#3 Ramnath       400.28µs 415.44µs    2381.    1.55KB    16.7  1142     8   480ms
#4 Hong Ooi      196.87µs 206.72µs    4755.      488B    18.8  2276     9   479ms
#5 GKi             61.8µs  66.16µs   14808.      280B    20.9  7076    10   478ms

for 循环比第二个 Hong Ooi 快大约 3 倍,并且使用最少的内存。

An option would be to use a for loop.

for(i in c("x1", "x2")) df[[i]][is.na(df[[i]])] <- FALSE

Benchmark

set.seed(42)
df <- data.frame(
  id = c(rep(1:19),NA),
  x1 = sample(c(NA,TRUE), 20, replace = TRUE),
  x2 = sample(c(NA,TRUE), 20, replace = TRUE)
)

bench::mark(check=FALSE,
"Holger Brandl" = local(dplyr::mutate_each(df, dplyr::funs(replace(., is.na(.), F)), x1, x2)),
"mtelesha" = local(df <- tidyr::replace_na(df, list(x1 = FALSE, x2 = FALSE))),
Ramnath = local(df[,c('x1', 'x2')] <- gdata::NAToUnknown(df[,c('x1', 'x2')], unknown = 'FALSE')),
"Hong Ooi" = local(df[c("x1", "x2")][is.na(df[c("x1", "x2")])] <- FALSE),
GKi = local(for(i in c("x1", "x2")) df[[i]][is.na(df[[i]])] <- FALSE) )
#  expression         min   median `itr/sec` mem_al…¹ gc/se…² n_itr  n_gc total…³
#  <bch:expr>    <bch:tm> <bch:tm>     <dbl> <bch:by>   <dbl> <int> <dbl> <bch:t>
#1 Holger Brandl  16.93ms  17.33ms      57.6  34.43KB    19.2    21     7   365ms
#2 mtelesha        3.94ms   4.39ms     226.    8.15KB    13.1   103     6   456ms
#3 Ramnath       400.28µs 415.44µs    2381.    1.55KB    16.7  1142     8   480ms
#4 Hong Ooi      196.87µs 206.72µs    4755.      488B    18.8  2276     9   479ms
#5 GKi             61.8µs  66.16µs   14808.      280B    20.9  7076    10   478ms

The for-loop is about 3 times faster than Hong Ooi the second and uses the lowest amount of memory.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文