根据另一个变量替换Na,而将NA替换为观察Na,而Na没有无误的邻居

发布于 2025-01-21 19:23:45 字数 490 浏览 0 评论 0原文

在这里,我有一个看起来像这样的数据:

year <- c(2000,2001,2002,2003,2005,2006,2007,2008,2009,2010)
x <- c(1,2,3,NA,5,NA,NA,NA,9,10)
dat <- data.frame(year, x)
  1. 我想根据年度变量将na替换为最近的邻居。

例如,数据的第四位(第一个na)从其左邻居而不是右邻居中获取值,因为其年度“ 2003”更接近“ 2002”,而不是“ 2005”

  1. 我想在没有最近的nonna邻居时将na留在那里。

例如,数据的第七名(第三个NA)仍然是Na,因为它没有非NA邻居。

插算后,结果x应为1,2,3,3,5,5,Na,9,9,9,10

Here I have a data that looks like this:

year <- c(2000,2001,2002,2003,2005,2006,2007,2008,2009,2010)
x <- c(1,2,3,NA,5,NA,NA,NA,9,10)
dat <- data.frame(year, x)
  1. I want to replace NA with the nearest neighbor according to the year variable.

For example, The fourth place of the data (the first NA) takes the value from its left neighbor rather than its right neighbor because its year "2003" is closer to "2002" instead of "2005"

  1. I want to leave the NA there when it does not have nearest nonNA neighbor.

For example, the seventh place of the data (the third NA) will still be NA because it does not have non-NA neighbor.

After imputing, the resulting x should be 1, 2, 3, 3, 5, 5, NA, 9, 9, 10

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

溺孤伤于心 2025-01-28 19:23:45

一个选项是使用case_whentidyverse使用。从本质上讲,如果上一行的年份较近,并且不是na,则从该行返回x。如果不是,请选择下面的行。或者,如果年份更近,但是有一个na,请返回下面的行。然后,如果以下行有较近的一年,但具有na,请返回上面的行。如果一行没有na,则只需返回x

library(tidyverse)

dat %>%
  mutate(x = case_when(is.na(x) & !is.na(lag(x)) & year - lag(year) < lead(year) - year ~ lag(x),
                       is.na(x) & !is.na(lead(x)) & year - lag(year) > lead(year) - year ~ lead(x),
                       is.na(x) & is.na(lag(x)) ~ lead(x),
                       is.na(x) & is.na(lead(x)) ~ lag(x),
                       TRUE ~ x))

输出

   year  x
1  2000  1
2  2001  2
3  2002  3
4  2003  3
5  2005  5
6  2006  5
7  2007 NA
8  2008  9
9  2009  9
10 2010 10

One option would be to make use of case_when from tidyverse. Essentially, if the previous row has a closer year and is not NA, then return x from that row. If not, then choose the row below. Or if the year is closer above but there is an NA, then return the row below. Then, same for if the row below has a closer year, but has an NA, then return the row above. If a row does not have an NA, then just return x.

library(tidyverse)

dat %>%
  mutate(x = case_when(is.na(x) & !is.na(lag(x)) & year - lag(year) < lead(year) - year ~ lag(x),
                       is.na(x) & !is.na(lead(x)) & year - lag(year) > lead(year) - year ~ lead(x),
                       is.na(x) & is.na(lag(x)) ~ lead(x),
                       is.na(x) & is.na(lead(x)) ~ lag(x),
                       TRUE ~ x))

Output

   year  x
1  2000  1
2  2001  2
3  2002  3
4  2003  3
5  2005  5
6  2006  5
7  2007 NA
8  2008  9
9  2009  9
10 2010 10
野侃 2025-01-28 19:23:45

使用imap()的方法:

library(tidyverse)

dat %>%
  mutate(new = imap_dbl(x, ~ {
    if(is.na(.x)) {
      dist <- abs(year[-.y] - year[.y])
      res <- x[-.y][dist == min(dist, na.rm = TRUE)]
      if(all(is.na(res))) NA else na.omit(res)
    } else .x
  }))

#    year  x new
# 1  2000  1   1
# 2  2001  2   2
# 3  2002  3   3
# 4  2003 NA   3
# 5  2005  5   5
# 6  2006 NA   5
# 7  2007 NA  NA
# 8  2008 NA   9
# 9  2009  9   9
# 10 2010 10  10

A method using imap():

library(tidyverse)

dat %>%
  mutate(new = imap_dbl(x, ~ {
    if(is.na(.x)) {
      dist <- abs(year[-.y] - year[.y])
      res <- x[-.y][dist == min(dist, na.rm = TRUE)]
      if(all(is.na(res))) NA else na.omit(res)
    } else .x
  }))

#    year  x new
# 1  2000  1   1
# 2  2001  2   2
# 3  2002  3   3
# 4  2003 NA   3
# 5  2005  5   5
# 6  2006 NA   5
# 7  2007 NA  NA
# 8  2008 NA   9
# 9  2009  9   9
# 10 2010 10  10
不弃不离 2025-01-28 19:23:45

data.table方法

library(data.table)
setDT(dat)
# first or last NA in a sequence of NA's?
# we need to convert these back to NA later in the process
dat[is.na(x) & is.na(shift(x, type = "lag")) & is.na(shift(x, type = "lead")), excl := "1"]
# rolling self-join on x
dat[is.na(x), x := dat[!is.na(x), ][.SD, x, on = .(year), roll = "nearest"]]
# set x back to NA if needed, remove the excl column
dat[excl == 1, x := NA][, excl := NULL][]
#    year  x
# 1: 2000  1
# 2: 2001  2
# 3: 2002  3
# 4: 2003  3
# 5: 2005  5
# 6: 2006  5
# 7: 2007 NA
# 8: 2008  9
# 9: 2009  9
#10: 2010 10

a data.table approach

library(data.table)
setDT(dat)
# first or last NA in a sequence of NA's?
# we need to convert these back to NA later in the process
dat[is.na(x) & is.na(shift(x, type = "lag")) & is.na(shift(x, type = "lead")), excl := "1"]
# rolling self-join on x
dat[is.na(x), x := dat[!is.na(x), ][.SD, x, on = .(year), roll = "nearest"]]
# set x back to NA if needed, remove the excl column
dat[excl == 1, x := NA][, excl := NULL][]
#    year  x
# 1: 2000  1
# 2: 2001  2
# 3: 2002  3
# 4: 2003  3
# 5: 2005  5
# 6: 2006  5
# 7: 2007 NA
# 8: 2008  9
# 9: 2009  9
#10: 2010 10
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文