用滞后值 R 填充多个 NA

发布于 2025-01-09 06:35:37 字数 735 浏览 1 评论 0原文

我试图用成本列中最新的非 NA 值填充此数据框中的 NA 值。我想按城市分组 - 因此奥马哈的所有 NA 应为 44.50,林肯的所有 NA 应为 62.50。这是我一直在使用的代码 - 它用正确的值替换每个组的第一个 NA(四月),但不会填充过去的值。

df <- df %>% 
  group_by(city) %>%
  mutate(cost = ifelse(is.na(cost), lag(cost, na.rm=TRUE), cost))

运行代码前的数据:

year   month      city     cost
2021   January    Omaha     45.50  
2021   February   Omaha     46.75
2021   March      Omaha     44.50
2021   April      Omaha     NA
2021   May        Omaha     NA
2021   June       Omaha     NA
2021   January    Lincoln   55.25
2021   February   Lincoln   53.80
2021   March      Lincoln   62.50
2021   April      Lincoln   NA
2021   May        Lincoln   NA
2021   June       Lincoln   NA

I am trying to fill the NA values in this data frame with the most recent non-NA value in the cost column. I want to group by city - so all NAs for Omaha should be 44.50, and the NAs for Lincoln should be 62.50. Here is the code I have been using - it replaces the first NA (April) for each group with the correct value, but does not fill past that.

df <- df %>% 
  group_by(city) %>%
  mutate(cost = ifelse(is.na(cost), lag(cost, na.rm=TRUE), cost))

Data before running code:

year   month      city     cost
2021   January    Omaha     45.50  
2021   February   Omaha     46.75
2021   March      Omaha     44.50
2021   April      Omaha     NA
2021   May        Omaha     NA
2021   June       Omaha     NA
2021   January    Lincoln   55.25
2021   February   Lincoln   53.80
2021   March      Lincoln   62.50
2021   April      Lincoln   NA
2021   May        Lincoln   NA
2021   June       Lincoln   NA

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

千笙结 2025-01-16 06:35:37

使用:

library(tidyverse)

df %>% 
  group_by(city) %>%
  fill(cost)

# A tibble: 12 x 4
# Groups:   city [2]
    year month    city     cost
   <int> <chr>    <chr>   <dbl>
 1  2021 January  Omaha    45.5
 2  2021 February Omaha    46.8
 3  2021 March    Omaha    44.5
 4  2021 April    Omaha    44.5
 5  2021 May      Omaha    44.5
 6  2021 June     Omaha    44.5
 7  2021 January  Lincoln  55.2
 8  2021 February Lincoln  53.8
 9  2021 March    Lincoln  62.5
10  2021 April    Lincoln  62.5
11  2021 May      Lincoln  62.5
12  2021 June     Lincoln  62.5

Use:

library(tidyverse)

df %>% 
  group_by(city) %>%
  fill(cost)

# A tibble: 12 x 4
# Groups:   city [2]
    year month    city     cost
   <int> <chr>    <chr>   <dbl>
 1  2021 January  Omaha    45.5
 2  2021 February Omaha    46.8
 3  2021 March    Omaha    44.5
 4  2021 April    Omaha    44.5
 5  2021 May      Omaha    44.5
 6  2021 June     Omaha    44.5
 7  2021 January  Lincoln  55.2
 8  2021 February Lincoln  53.8
 9  2021 March    Lincoln  62.5
10  2021 April    Lincoln  62.5
11  2021 May      Lincoln  62.5
12  2021 June     Lincoln  62.5
撕心裂肺的伤痛 2025-01-16 06:35:37

对于您的代码,您需要使用 last 而不是 lag (尽管 fill 是这里更好的选择)。我们还需要将 cost 包装在 na.omit 中。

library(tidyverse)

df %>%
  group_by(city) %>%
  mutate(cost = ifelse(is.na(cost), last(na.omit(cost)), cost))

输出

    year month    city     cost
   <int> <chr>    <chr>   <dbl>
 1  2021 January  Omaha    45.5
 2  2021 February Omaha    46.8
 3  2021 March    Omaha    44.5
 4  2021 April    Omaha    44.5
 5  2021 May      Omaha    44.5
 6  2021 June     Omaha    44.5
 7  2021 January  Lincoln  55.2
 8  2021 February Lincoln  53.8
 9  2021 March    Lincoln  62.5
10  2021 April    Lincoln  62.5
11  2021 May      Lincoln  62.5
12  2021 June     Lincoln  62.5

数据

df <- structure(list(year = c(2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 
2021L, 2021L, 2021L, 2021L, 2021L, 2021L), month = c("January", 
"February", "March", "April", "May", "June", "January", "February", 
"March", "April", "May", "June"), city = c("Omaha", "Omaha", 
"Omaha", "Omaha", "Omaha", "Omaha", "Lincoln", "Lincoln", "Lincoln", 
"Lincoln", "Lincoln", "Lincoln"), cost = c(45.5, 46.75, 44.5, 
NA, NA, NA, 55.25, 53.8, 62.5, NA, NA, NA)), class = "data.frame", row.names = c(NA, 
-12L))

With your code, you would want to use last rather than lag (though fill is the much better option here). We also need to wrap cost in na.omit.

library(tidyverse)

df %>%
  group_by(city) %>%
  mutate(cost = ifelse(is.na(cost), last(na.omit(cost)), cost))

Output

    year month    city     cost
   <int> <chr>    <chr>   <dbl>
 1  2021 January  Omaha    45.5
 2  2021 February Omaha    46.8
 3  2021 March    Omaha    44.5
 4  2021 April    Omaha    44.5
 5  2021 May      Omaha    44.5
 6  2021 June     Omaha    44.5
 7  2021 January  Lincoln  55.2
 8  2021 February Lincoln  53.8
 9  2021 March    Lincoln  62.5
10  2021 April    Lincoln  62.5
11  2021 May      Lincoln  62.5
12  2021 June     Lincoln  62.5

Data

df <- structure(list(year = c(2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 
2021L, 2021L, 2021L, 2021L, 2021L, 2021L), month = c("January", 
"February", "March", "April", "May", "June", "January", "February", 
"March", "April", "May", "June"), city = c("Omaha", "Omaha", 
"Omaha", "Omaha", "Omaha", "Omaha", "Lincoln", "Lincoln", "Lincoln", 
"Lincoln", "Lincoln", "Lincoln"), cost = c(45.5, 46.75, 44.5, 
NA, NA, NA, 55.25, 53.8, 62.5, NA, NA, NA)), class = "data.frame", row.names = c(NA, 
-12L))
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文