如何获得连续变量和分类变量之间比较的相关系数

发布于 2025-01-16 21:52:50 字数 1548 浏览 1 评论 0原文

我有这个数据集。我试图找到连续变量 expr 与分类变量 WHO_Grade 之间的相关性：

> dput(tmp)

structure(list(expr = c(3.72491159808923, 7.8316405937301, 4.1302793124001, 
6.81536170645658, 6.68352582647051, 6.0974581720256, 6.81642917136002, 
6.52282686468863, 6.95033151442703, 7.40122305409127, 6.734502473652, 
4.52338197246748, 5.66198159225926, 6.35210096732929, 5.98394091367302, 
6.17792680351041, 6.99774731062209, 6.47837700390364, 8.46842852300251, 
8.8053866571277, 7.69349747186817, 9.92409345097255, 8.32535569092761, 
11.0752169414371, 6.46020070978671, 6.49791316573007, 4.67879084729252, 
6.27362589525792, 5.57597697034067, 4.81081903029741, 6.49576031725988, 
5.03389765403437, 5.07427129999886), WHO_Grade = c("4", "3", 
"3", "3", "3", "3", "3", "3", "2", "2", "2", "4", "4", "3", "4", 
"3", "3", "4", "1", "1", "1", "1", "1", "1", "1", "4", "4", "4", 
"4", "4", "4", "4", "4")), class = "data.frame", row.names = c(NA, 
-33L))

> kruskal.test(expr ~ WHO_Grade, data = tmp)

    Kruskal-Wallis rank sum test

data:  expr by WHO_Grade
Kruskal-Wallis chi-squared = 19.659, df = 3, p-value = 0.0001998

这是相同数据的箱线图。从箱线图中可以明显看出，表达与 WHO_Grade 之间存在负相关（1-4 表示疾病严重程度增加）。有没有一种方法可以获得单个值（类似相关系数），该值可以告诉我关系是负相关还是正相关，而无需查看绘图？

原文

I have this dataset. I am trying to find the correlation between a continuous variable expr with a categorical variable WHO_Grade:

> dput(tmp)

structure(list(expr = c(3.72491159808923, 7.8316405937301, 4.1302793124001, 
6.81536170645658, 6.68352582647051, 6.0974581720256, 6.81642917136002, 
6.52282686468863, 6.95033151442703, 7.40122305409127, 6.734502473652, 
4.52338197246748, 5.66198159225926, 6.35210096732929, 5.98394091367302, 
6.17792680351041, 6.99774731062209, 6.47837700390364, 8.46842852300251, 
8.8053866571277, 7.69349747186817, 9.92409345097255, 8.32535569092761, 
11.0752169414371, 6.46020070978671, 6.49791316573007, 4.67879084729252, 
6.27362589525792, 5.57597697034067, 4.81081903029741, 6.49576031725988, 
5.03389765403437, 5.07427129999886), WHO_Grade = c("4", "3", 
"3", "3", "3", "3", "3", "3", "2", "2", "2", "4", "4", "3", "4", 
"3", "3", "4", "1", "1", "1", "1", "1", "1", "1", "4", "4", "4", 
"4", "4", "4", "4", "4")), class = "data.frame", row.names = c(NA, 
-33L))

> kruskal.test(expr ~ WHO_Grade, data = tmp)

    Kruskal-Wallis rank sum test

data:  expr by WHO_Grade
Kruskal-Wallis chi-squared = 19.659, df = 3, p-value = 0.0001998

And here is the boxplot for the same data. As evident from the boxplot, there is a negative correlation between the expression and WHO_Grade (1-4 are in increasing severity of disease). Is there a way I can obtain a single value (something like a correlation coefficient) which can tell me that the relationship is negatively or positively correlated without having to look at the plot?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

定格我的天空 2025-01-23 21:52:50

Kruskal-Wallis 检验评估样本中任何类别之间是否存在显着差异（总体值）。我会进行成对威尔科克森检验，以评估每组之间连续变量的差异。

results<- tmp %>% 
  select_if(is.numeric) %>%
  purrr::map(~ pairwise.wilcox.test(.x , tmp$WHO_Grade, p.adjust.method = "fdr"))

虽然从技术上讲你可以做成对的 Kruskal-Wallis，但它的设计包括
至少3类。 Wilcoxon 检验也是一种用于比较两组的非参数检验。

Kruskal-Wallis test evaluates if there is a significant variation between any categories in the sample (overall value). I would do pairwise Wilcoxon test, to evaluate the difference in the continious variable between each group.

results<- tmp %>% 
  select_if(is.numeric) %>%
  purrr::map(~ pairwise.wilcox.test(.x , tmp$WHO_Grade, p.adjust.method = "fdr"))

While you technically can do pairwise Kruskal-Wallis, it is designed to include
at least 3 categories. Wilcoxon's test is also a non parametric test to compare two groups.

回复收藏 0 原文

~没有更多了~