当前位置：文江博客话题详情

logging r drop tidyverse

如何打印 tidyverse 的函数（如 filter 或 drop_na）删除的观察数？

发布于 2025-01-18 13:53:09 字数 463 浏览 1 评论 0 原文

对于不同的分析，我使用不同的样本，但我需要弄清楚样本是如何产生的。

每次删除命令后，Stata 都会向我显示“XX 观测值被删除”。有没有办法让 R 打印通过“tidyverse 风格”样本选择删除的观察值的数量（见下文）？

在此示例中，我想在控制台中查看 filter 和 drop_na 函数删除了多少个观测值。我试过： summarise_all(~sum(is.na(.))) 但没有成功。

capmkt_df <- stata_df %>%
  filter(change != 1 & reg_mkt == 1) %>% 
  select(any_of(capmkt_vars)) %>%
  mutate_at(vars(country, year), factor) %>%
  drop_na()

原文

For different analyses, I use different samples, but I need to make it clear how the samples came about.

Stata shows me "XX observations dropped" after each drop command. Is there a way to get R to print the number of dropped observations by a "tidyverse styled" sample selection (see below)?

In this example I would like to see in the console how many observations were dropped by the filter and drop_na functions.
I tried:
summarise_all(~sum(is.na(.)))
but it was unsuccessful.

capmkt_df <- stata_df %>%
  filter(change != 1 & reg_mkt == 1) %>% 
  select(any_of(capmkt_vars)) %>%
  mutate_at(vars(country, year), factor) %>%
  drop_na()

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

自我难过 2025-01-25 13:53:09

由于您正在使用 tidyverse 软件包，因此一个好的资源是 tidylog ，该软件包为大量 didyverse 提供了其他信息（包括 dplyr 和 tidyr ）功能。

例如，使用 drop_na ，您将获得一个消息 drop_na：删除x行。带有基本r airquality 数据集的插图：

library(tidyverse)
library(tidylog, warn.conflicts = F)

airquality %>% 
  drop_na()

# drop_na: removed 42 rows (27%), 111 rows remaining
#     Ozone Solar.R Wind Temp Month Day
# 1      41     190  7.4   67     5   1
# 2      36     118  8.0   72     5   2
# 3      12     149 12.6   74     5   3
# 4      18     313 11.5   62     5   4
# 5      23     299  8.6   65     5   7
# 6      19      99 13.8   59     5   8
# 7       8      19 20.1   61     5   9
# 8      16     256  9.7   69     5  12
# 9      11     290  9.2   66     5  13
# 10     14     274 10.9   68     5  14
# ...

Since you're using tidyverse packages, a good resource is tidylog, a package that provides additional information for a lot of tidyverse (including dplyr and tidyr) functions.

For example, using drop_na, you'll get a message drop_na: removed X rows. An illustration with the base R airquality dataset:

library(tidyverse)
library(tidylog, warn.conflicts = F)

airquality %>% 
  drop_na()

# drop_na: removed 42 rows (27%), 111 rows remaining
#     Ozone Solar.R Wind Temp Month Day
# 1      41     190  7.4   67     5   1
# 2      36     118  8.0   72     5   2
# 3      12     149 12.6   74     5   3
# 4      18     313 11.5   62     5   4
# 5      23     299  8.6   65     5   7
# 6      19      99 13.8   59     5   8
# 7       8      19 20.1   61     5   9
# 8      16     256  9.7   69     5  12
# 9      11     290  9.2   66     5  13
# 10     14     274 10.9   68     5  14
# ...

回复收藏 0 原文

独夜无伴 2025-01-25 13:53:09

一个选项是在删除 na 值之前打印一个不完整的总和。在这里，我们可以使用 magrittr 的Tee Pipe（％T＆gt;％）来打印结果。

library(tidyverse)

df %>%
  filter(x %in% c(1, 2, NA)) %T>%
  {print(sum(!complete.cases(.)))} %>%
  drop_na()

输出

因此，您将看到2行被删除，因为它们都有 na s。

[1] 2
# A tibble: 1 × 2
      x y    
  <dbl> <chr>
1     1 a

因此，对于您的代码，您可以编写：

capmkt_df <- stata_df %>%
  filter(change != 1 & reg_mkt == 1) %>% 
  select(any_of(capmkt_vars)) %>%
  mutate_at(vars(country, year), factor) %T>%
  {print(sum(!complete.cases(.)))} %>%
  drop_na()

数据

df <- structure(list(x = c(1, 2, NA), y = c("a", NA, "b")), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -3L))

One option is to print a sum of not complete.cases before dropping the NA values. Here, we can use the tee pipe (%T>%) from magrittr to print the results along the way.

library(tidyverse)

df %>%
  filter(x %in% c(1, 2, NA)) %T>%
  {print(sum(!complete.cases(.)))} %>%
  drop_na()

Output

So, you will see that 2 rows were dropped, as they both had NAs.

[1] 2
# A tibble: 1 × 2
      x y    
  <dbl> <chr>
1     1 a

So, for your code, you could write:

capmkt_df <- stata_df %>%
  filter(change != 1 & reg_mkt == 1) %>% 
  select(any_of(capmkt_vars)) %>%
  mutate_at(vars(country, year), factor) %T>%
  {print(sum(!complete.cases(.)))} %>%
  drop_na()

Data

df <- structure(list(x = c(1, 2, NA), y = c("a", NA, "b")), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -3L))

回复收藏 0 原文

~没有更多了~

关于作者

猫九

暂无简介

文章

833 人气

关注发私信

友情链接

文江博客

如何打印 tidyverse 的函数（如 filter 或 drop_na）删除的观察数？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

尘曦

在梵高的星空下

善良天后

韬韬不绝

qq_CgiN62

不美如何

友情链接

如何打印 tidyverse 的函数（如 filter 或 drop_na）删除的观察数？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

尘曦

在梵高的星空下

善良天后

韬韬不绝

qq_CgiN62

不美如何

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。