在 gtsummary 中添加缺失值的频率和百分比

发布于 2025-01-16 08:46:40 字数 1596 浏览 0 评论 0原文

df_nhpi %>%    
select(AGE, SEX, MAR_STAT, HEIGHT, WEIGHT, BMI, HTN, HTNMED, MI, Smoking, COPD, CANCER, DIABETES) %>%   
tbl_summary(by = SEX,               
           label = list(MAR_STAT ~ 'Marital Status',        
                        HTN ~ 'Hypertension',                            
                        HTNMED ~ 'Hypertension Medication',                            
                        MI ~ 'Heart Attack',                             
                        Smoking ~ 'Smoking Status',                             
                        COPD ~ 'Chronic Obstructive Pulmonary Disease'),               
           type = list(c("HTN","HTNMED", "MI", "COPD", "CANCER") ~ "categorical"),               
           missing = "ifany",               
           missing_text = "Unknown",               
           statistic = list(all_continuous() ~ "{mean} ({sd})",                                
                            all_categorical() ~ "{n} ({p}%)"),               
           digits = all_continuous() ~ 2, percent = "column") %>%   
add_stat_label() %>%   
add_p(test = all_continuous() ~ "t.test", pvalue_fun = 
           function(x) style_pvalue(x, digits = 3)) %>%   
bold_p() %>%   
modify_caption("**Table 1. Baseline Characteristics**") %>%   bold_labels()

我正在尝试生成一个表。但是,这里的问题是,我想要跨列的缺失值(特别是分类变量)的 %,同时,我不希望在计算 p 值时包含缺失值。我正在尝试用单个代码块来完成此操作。无论如何可以做到这一点还是我应该采用传统方法?

过去三天我一直在搜索整个互联网。但是,我没有找到任何适合我的情况。

PS:mutate 和 forcats 不起作用,因为它会扭曲我的 p 值。这是我生成的表

df_nhpi %>%    
select(AGE, SEX, MAR_STAT, HEIGHT, WEIGHT, BMI, HTN, HTNMED, MI, Smoking, COPD, CANCER, DIABETES) %>%   
tbl_summary(by = SEX,               
           label = list(MAR_STAT ~ 'Marital Status',        
                        HTN ~ 'Hypertension',                            
                        HTNMED ~ 'Hypertension Medication',                            
                        MI ~ 'Heart Attack',                             
                        Smoking ~ 'Smoking Status',                             
                        COPD ~ 'Chronic Obstructive Pulmonary Disease'),               
           type = list(c("HTN","HTNMED", "MI", "COPD", "CANCER") ~ "categorical"),               
           missing = "ifany",               
           missing_text = "Unknown",               
           statistic = list(all_continuous() ~ "{mean} ({sd})",                                
                            all_categorical() ~ "{n} ({p}%)"),               
           digits = all_continuous() ~ 2, percent = "column") %>%   
add_stat_label() %>%   
add_p(test = all_continuous() ~ "t.test", pvalue_fun = 
           function(x) style_pvalue(x, digits = 3)) %>%   
bold_p() %>%   
modify_caption("**Table 1. Baseline Characteristics**") %>%   bold_labels()

I'm trying to generate a table one. But, the issue here is, I want % for missing values across columns (specifically for categorical variables) and at the same time, I don't want missing values to be included while calculating p-values. I'm trying to do this in single chunk of code. Is there anyway to do this or should I go for the conventional method?

I've been searching the whole internet for the past three days. But, I don't find anything that works in my case.

PS: mutate and forcats doesn't work as it skews my p-values.This was the table I generated

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

喵星人汪星人 2025-01-23 08:46:40

我准备了两个解决方案,它们都报告丢失数据的比例。希望其中之一适合您!

library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.5.2'

# add % missing in new column
tbl1 <-
  trial %>%
  tbl_summary(
    by = trt, 
    include = response, 
    type = all_dichotomous() ~ "categorical",
    missing = "no"
  ) %>%
  add_p() %>%
  add_n(statistic = "{n_miss} ({p_miss}%)") %>%
  modify_header(n = "**Missing**")

输入图片此处描述

# prepare tbl_summary with rows for missing, then merge in p-values
tbl2 <-
  trial %>%
  dplyr::mutate(response = forcats::fct_explicit_na(factor(response))) %>%
  tbl_summary(
    by = trt, 
    include = response, 
    label = list(response = "Tumor Response")
  ) %>%
  list(tbl1 %>% modify_column_hide(c(n, all_stat_cols()))) %>%
  tbl_merge(tab_spanner = FALSE)

在此处输入图像描述
reprex 软件包 (v2.0.1) 创建于 2022 年 3 月 22 日

I prepared two solutions that both report the proportion of missing data. Hopefully one of them works for you!

library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.5.2'

# add % missing in new column
tbl1 <-
  trial %>%
  tbl_summary(
    by = trt, 
    include = response, 
    type = all_dichotomous() ~ "categorical",
    missing = "no"
  ) %>%
  add_p() %>%
  add_n(statistic = "{n_miss} ({p_miss}%)") %>%
  modify_header(n = "**Missing**")

enter image description here

# prepare tbl_summary with rows for missing, then merge in p-values
tbl2 <-
  trial %>%
  dplyr::mutate(response = forcats::fct_explicit_na(factor(response))) %>%
  tbl_summary(
    by = trt, 
    include = response, 
    label = list(response = "Tumor Response")
  ) %>%
  list(tbl1 %>% modify_column_hide(c(n, all_stat_cols()))) %>%
  tbl_merge(tab_spanner = FALSE)

enter image description here
Created on 2022-03-22 by the reprex package (v2.0.1)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文