R 中的多个卡方检验

发布于 2025-01-17 15:30:21 字数 754 浏览 2 评论 0原文

假设我有以下数据：

ID。	药物1.	药物2.	药物3.	药物4.
1.	1.	0.	0.	0.
2.	0.	0.	0.	1.
3.	0.	1.	0.	0.
4.	0.	0.	1.	0.
5.	1.	0.	0.	0.

其中 ID 是给每个患者的编号，每个药物变量都是二进制变量，其中 1 表示患者对该药物有某种病症，0 表示他/她没有。

为了比较药物之间的病情发生率比例，我想进行卡方检验，例如：药物1与药物2、药物1与药物3、药物1与药物4、药物2与药物3、药物2与药物4等。

我该怎么办这在 R 中的一行代码中？顺便说一句，是否有必要对多重比较进行校正（例如，Bonferroni）？

原文

Imagine I have the following data:

ID.	Drug1.	Drug2.	Drug3.	Drug4.
1.	1.	0.	0.	0.
2.	0.	0.	0.	1.
3.	0.	1.	0.	0.
4.	0.	0.	1.	0.
5.	1.	0.	0.	0.

Where ID is the number given to each patient and each Drug variable is a binary variable where 1 indicates that patient had a certain condition on that drug and 0 indicates he/she didn't.

In order to compare the proportion of the rate of condition between drugs, I want to perform chi-sqauare tests like: Drug1 vs Drug2, Drug1 vs Drug3, Drug1 vs Drug4, Drug2 vs Drug3, Drug2 vs Drug4, etc.

How can I do this in R in one line of code?
Btw, is it necessary to implement correction for multiple comparisons (e.g., Bonferroni)?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

绮烟 2025-01-24 15:30:21

下面是使用 {dplyr} 的 tidyverse 方法。
我首先生成一些数据来运行实际测试并获得有意义的结果。
然后我们可以使用mydat的colnames和combn来获取所有药物对。然后我们可以使用 rowwise 和 mutate 并将 chisq.test() 应用于每一行。在这里，我们使用 V1 和 V2 中的字符串来对 mydat 中的变量进行子集化。由于我们位于 data.frame 中，如果结果是非原子向量，我们必须将结果包装在 list 中。我们可以将 chisq_test 与 $p.value 进行子集化来获取 p 值。

library(dplyr) 
set.seed(123)

mydat <- tibble(ID = 1:1000,
                Drug1 = round(rnorm(1000, 0.8, sd = 0.5)),
                Drug2 = round(rnorm(1000, 0.6, sd = 0.5), 0),
                Drug3 = round(rnorm(1000, 0.5, sd = 0.5), 0),
                Drug4 = round(rnorm(1000, 0.3, sd = 0.3), 0)
                ) %>% 
  mutate(across(starts_with("Drug"), ~ case_when(.x >0 ~ 0,
                                                 .x <1 ~ 1,
                                                 TRUE ~ .x))
  )

mydat %>% 
  select(-ID) %>% 
  colnames() %>% 
  combn(2) %>% 
  t() %>% 
  as_tibble() %>% 
  rowwise %>% 
  mutate(chisq_test = list(
    table(mydat[[V1]], mydat[[V2]]) %>% chisq.test()
    ),
    chisq_pval = chisq_test$p.value
    )

#> Using compatibility `.name_repair`.
#> # A tibble: 6 x 4
#> # Rowwise: 
#>   V1    V2    chisq_test chisq_pval
#>   <chr> <chr> <list>          <dbl>
#> 1 Drug1 Drug2 <htest>       0.00694
#> 2 Drug1 Drug3 <htest>       0.298  
#> 3 Drug1 Drug4 <htest>       0.926  
#> 4 Drug2 Drug3 <htest>       0.998  
#> 5 Drug2 Drug4 <htest>       0.574  
#> 6 Drug3 Drug4 <htest>       0.895

reprex 软件包 (v2.0.1)于 2022 年 4 月 4 日创建

^{由以下是我的旧答案，它比较了每种药物中 0 和 1 的分布，这不是OP所要求的，正如@KU99在中正确指出的那样的评论。}

library(tibble) # for reading in your data

mydat <-
  tribble(~ID, ~Drug1,  ~Drug2, ~Drug3,  ~Drug4,
           1, 1,      0,      0,      0,  
           2, 0,      0,      0,      1,  
           3, 0,      1,      0,      0,  
           4, 0,      0,      1,      0,  
           5, 1,      0,      0,      0
  )

lapply(mydat[, -1], function(x) chisq.test(table(x)))

#> $Drug1
#> 
#>  Chi-squared test for given probabilities
#> 
#> data:  table(x)
#> X-squared = 0.2, df = 1, p-value = 0.6547
#> 
#> 
#> $Drug2
#> 
#>  Chi-squared test for given probabilities
#> 
#> data:  table(x)
#> X-squared = 1.8, df = 1, p-value = 0.1797
#> 
#> 
#> $Drug3
#> 
#>  Chi-squared test for given probabilities
#> 
#> data:  table(x)
#> X-squared = 1.8, df = 1, p-value = 0.1797
#> 
#> 
#> $Drug4
#> 
#>  Chi-squared test for given probabilities
#> 
#> data:  table(x)
#> X-squared = 1.8, df = 1, p-value = 0.1797

^{由 reprex 软件包 (v0.3.0) 创建于 2022 年 3 月 29 日}

Below is a tidyverse approach using {dplyr}.
I first generate some data to run real tests with meaningful results.
Then we can use the colnames of mydat with combn to get all pairs of drugs. Then we can use rowwise and mutate and apply chisq.test() to each row. Here we use the strings in V1 and V2 to subset the variables in mydat. Since we are in a data.frame we have to wrap the result in list if its a non-atomic vector. We can subset chisq_test with $p.value to get the p values.

library(dplyr) 
set.seed(123)

mydat <- tibble(ID = 1:1000,
                Drug1 = round(rnorm(1000, 0.8, sd = 0.5)),
                Drug2 = round(rnorm(1000, 0.6, sd = 0.5), 0),
                Drug3 = round(rnorm(1000, 0.5, sd = 0.5), 0),
                Drug4 = round(rnorm(1000, 0.3, sd = 0.3), 0)
                ) %>% 
  mutate(across(starts_with("Drug"), ~ case_when(.x >0 ~ 0,
                                                 .x <1 ~ 1,
                                                 TRUE ~ .x))
  )

mydat %>% 
  select(-ID) %>% 
  colnames() %>% 
  combn(2) %>% 
  t() %>% 
  as_tibble() %>% 
  rowwise %>% 
  mutate(chisq_test = list(
    table(mydat[[V1]], mydat[[V2]]) %>% chisq.test()
    ),
    chisq_pval = chisq_test$p.value
    )

#> Using compatibility `.name_repair`.
#> # A tibble: 6 x 4
#> # Rowwise: 
#>   V1    V2    chisq_test chisq_pval
#>   <chr> <chr> <list>          <dbl>
#> 1 Drug1 Drug2 <htest>       0.00694
#> 2 Drug1 Drug3 <htest>       0.298  
#> 3 Drug1 Drug4 <htest>       0.926  
#> 4 Drug2 Drug3 <htest>       0.998  
#> 5 Drug2 Drug4 <htest>       0.574  
#> 6 Drug3 Drug4 <htest>       0.895

^{Created on 2022-04-04 by the reprex package (v2.0.1)}

Below is my old answer, which compares the distribution of 0 and 1 within each drug, which is not what the OP asked for, as @KU99 correctly pointed out in the comments.

library(tibble) # for reading in your data

mydat <-
  tribble(~ID, ~Drug1,  ~Drug2, ~Drug3,  ~Drug4,
           1, 1,      0,      0,      0,  
           2, 0,      0,      0,      1,  
           3, 0,      1,      0,      0,  
           4, 0,      0,      1,      0,  
           5, 1,      0,      0,      0
  )

lapply(mydat[, -1], function(x) chisq.test(table(x)))

#> $Drug1
#> 
#>  Chi-squared test for given probabilities
#> 
#> data:  table(x)
#> X-squared = 0.2, df = 1, p-value = 0.6547
#> 
#> 
#> $Drug2
#> 
#>  Chi-squared test for given probabilities
#> 
#> data:  table(x)
#> X-squared = 1.8, df = 1, p-value = 0.1797
#> 
#> 
#> $Drug3
#> 
#>  Chi-squared test for given probabilities
#> 
#> data:  table(x)
#> X-squared = 1.8, df = 1, p-value = 0.1797
#> 
#> 
#> $Drug4
#> 
#>  Chi-squared test for given probabilities
#> 
#> data:  table(x)
#> X-squared = 1.8, df = 1, p-value = 0.1797

^{Created on 2022-03-29 by the reprex package (v0.3.0)}

回复收藏 0 原文

~没有更多了~