为什么dplyr colesce(。)和填充(。)不起作用而仍然留下缺失的值?
我有一个简单的测试数据集,该数据集为参与者有许多重复行。我希望每个参与者没有NAS一排,除非参与者拥有整列的NAS。我尝试按参与者名称进行分组,然后使用cocece(。)
和填充(。)
,但仍然留下丢失的值。这是我的测试数据集:
library(dplyr)
library(tibble)
test_dataset <- tibble(name = rep(c("Justin", "Corey", "Sibley"), 4),
var1 = c(rep(c(NA), 10), 2, 3),
var2 = c(rep(c(NA), 9), 2, 4, 6),
var3 = c(10, 15, 7, rep(c(NA), 9)),
outcome = c(3, 9, 23, rep(c(NA), 9)),
tenure = rep(c(10, 15, 20), 4))
这是我使用cocece(。)
或填充(。,direction =“ downup”)
时得到的,这两者都会产生相同的结果。
library(dplyr)
library(tibble)
test_dataset_coalesced <- test_dataset %>%
group_by(name) %>%
coalesce(.) %>%
slice_head(n=1) %>%
ungroup()
test_dataset_filled <- test_dataset %>%
group_by(name) %>%
fill(., .direction="downup") %>%
slice_head(n=1) %>%
ungroup()
这就是我想要的 - 宣称,有一个NA,因为该列只有NA:
library(tibble)
correct <- tibble(name = c("Justin", "Corey", "Sibley"),
var1 = c(NA, 2, 3),
var2 = c(2, 4, 6),
var3 = c(10, 15, 7),
outcome = c(3, 9, 23),
tenure = c(10, 15, 20))
I have a simple test dataset that has many repeating rows for participants. I want one row per participant that doesn't have NAs, unless the participant has NAs for the entire column. I tried grouping by participant name and then using coalesce(.)
and fill(.)
, but it still leaves missing values. Here's my test dataset:
library(dplyr)
library(tibble)
test_dataset <- tibble(name = rep(c("Justin", "Corey", "Sibley"), 4),
var1 = c(rep(c(NA), 10), 2, 3),
var2 = c(rep(c(NA), 9), 2, 4, 6),
var3 = c(10, 15, 7, rep(c(NA), 9)),
outcome = c(3, 9, 23, rep(c(NA), 9)),
tenure = rep(c(10, 15, 20), 4))
And here's what I get when I use coalesce(.)
or fill(., direction = "downup")
, which both produce the same result.
library(dplyr)
library(tibble)
test_dataset_coalesced <- test_dataset %>%
group_by(name) %>%
coalesce(.) %>%
slice_head(n=1) %>%
ungroup()
test_dataset_filled <- test_dataset %>%
group_by(name) %>%
fill(., .direction="downup") %>%
slice_head(n=1) %>%
ungroup()
And here's what I want--note, there is one NA because that participant only has NA for that column:
library(tibble)
correct <- tibble(name = c("Justin", "Corey", "Sibley"),
var1 = c(NA, 2, 3),
var2 = c(2, 4, 6),
var3 = c(10, 15, 7),
outcome = c(3, 9, 23),
tenure = c(10, 15, 20))
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以
group_by
name
列,然后填充
na
(您需要填充
使用所有()
)的每个列都在组中的非NA值,然后仅保留Diftsntion
行。You can
group_by
thename
column, thenfill
theNA
(you need tofill
every column usingeverything()
) with the non-NA values within the group, then only keep thedistinct
rows.尝试一下
Try this