为什么dplyr colesce(。)和填充(。)不起作用而仍然留下缺失的值?

发布于 2025-02-04 05:20:32 字数 1329 浏览 3 评论 0原文

我有一个简单的测试数据集,该数据集为参与者有许多重复行。我希望每个参与者没有NAS一排,除非参与者拥有整列的NAS。我尝试按参与者名称进行分组,然后使用cocece(。)填充(。),但仍然留下丢失的值。这是我的测试数据集:

library(dplyr)
library(tibble)

test_dataset <- tibble(name = rep(c("Justin", "Corey", "Sibley"), 4),
                       var1 = c(rep(c(NA), 10), 2, 3),
                       var2 = c(rep(c(NA), 9), 2, 4, 6),
                       var3 = c(10, 15, 7, rep(c(NA), 9)),
                       outcome = c(3, 9, 23, rep(c(NA), 9)),
                       tenure = rep(c(10, 15, 20), 4))

这是我使用cocece(。)填充(。,direction =“ downup”)时得到的,这两者都会产生相同的结果。

library(dplyr)
library(tibble)

test_dataset_coalesced <- test_dataset %>% 
  group_by(name) %>%
  coalesce(.) %>%
  slice_head(n=1) %>%
  ungroup()

test_dataset_filled <- test_dataset %>% 
  group_by(name) %>%
  fill(., .direction="downup") %>%
  slice_head(n=1) %>%
  ungroup()

这就是我想要的 - 宣称,有一个NA,因为该列只有NA:

library(tibble)


correct <- tibble(name = c("Justin", "Corey", "Sibley"),
                  var1 = c(NA, 2, 3),
                  var2 = c(2, 4, 6),
                  var3 = c(10, 15, 7),
                  outcome = c(3, 9, 23),
                  tenure = c(10, 15, 20))

I have a simple test dataset that has many repeating rows for participants. I want one row per participant that doesn't have NAs, unless the participant has NAs for the entire column. I tried grouping by participant name and then using coalesce(.) and fill(.), but it still leaves missing values. Here's my test dataset:

library(dplyr)
library(tibble)

test_dataset <- tibble(name = rep(c("Justin", "Corey", "Sibley"), 4),
                       var1 = c(rep(c(NA), 10), 2, 3),
                       var2 = c(rep(c(NA), 9), 2, 4, 6),
                       var3 = c(10, 15, 7, rep(c(NA), 9)),
                       outcome = c(3, 9, 23, rep(c(NA), 9)),
                       tenure = rep(c(10, 15, 20), 4))

And here's what I get when I use coalesce(.) or fill(., direction = "downup"), which both produce the same result.

library(dplyr)
library(tibble)

test_dataset_coalesced <- test_dataset %>% 
  group_by(name) %>%
  coalesce(.) %>%
  slice_head(n=1) %>%
  ungroup()

test_dataset_filled <- test_dataset %>% 
  group_by(name) %>%
  fill(., .direction="downup") %>%
  slice_head(n=1) %>%
  ungroup()

And here's what I want--note, there is one NA because that participant only has NA for that column:

library(tibble)


correct <- tibble(name = c("Justin", "Corey", "Sibley"),
                  var1 = c(NA, 2, 3),
                  var2 = c(2, 4, 6),
                  var3 = c(10, 15, 7),
                  outcome = c(3, 9, 23),
                  tenure = c(10, 15, 20))

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

巴黎夜雨 2025-02-11 05:20:32

您可以group_by name列,然后填充 na(您需要填充使用所有())的每个列都在组中的非NA值,然后仅保留Diftsntion行。

library(tidyverse)

test_dataset %>% 
  group_by(name) %>% 
  fill(everything(), .direction = "downup") %>% 
  distinct()

# A tibble: 3 × 6
# Groups:   name [3]
  name    var1  var2  var3 outcome tenure
  <chr>  <dbl> <dbl> <dbl>   <dbl>  <dbl>
1 Justin    NA     2    10       3     10
2 Corey      2     4    15       9     15
3 Sibley     3     6     7      23     20

You can group_by the name column, then fill the NA (you need to fill every column using everything()) with the non-NA values within the group, then only keep the distinct rows.

library(tidyverse)

test_dataset %>% 
  group_by(name) %>% 
  fill(everything(), .direction = "downup") %>% 
  distinct()

# A tibble: 3 × 6
# Groups:   name [3]
  name    var1  var2  var3 outcome tenure
  <chr>  <dbl> <dbl> <dbl>   <dbl>  <dbl>
1 Justin    NA     2    10       3     10
2 Corey      2     4    15       9     15
3 Sibley     3     6     7      23     20
Smile简单爱 2025-02-11 05:20:32

尝试一下

cleaned<- test_dataset |> 
  dplyr::group_by(name) |> 
  tidyr::fill(everything(),.direction = "downup") |> 
  unique()

# To filter out the ones with all NAs
cleaned[sum(is.na(cleaned[,-1]))<ncol(cleaned[,-1]),]

  name    var1  var2  var3 outcome tenure
  <chr>  <dbl> <dbl> <dbl>   <dbl>  <dbl>
1 Justin    NA     2    10       3     10
2 Corey      2     4    15       9     15
3 Sibley     3     6     7      23     20

``

Try this

cleaned<- test_dataset |> 
  dplyr::group_by(name) |> 
  tidyr::fill(everything(),.direction = "downup") |> 
  unique()

# To filter out the ones with all NAs
cleaned[sum(is.na(cleaned[,-1]))<ncol(cleaned[,-1]),]

  name    var1  var2  var3 outcome tenure
  <chr>  <dbl> <dbl> <dbl>   <dbl>  <dbl>
1 Justin    NA     2    10       3     10
2 Corey      2     4    15       9     15
3 Sibley     3     6     7      23     20

``
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文