如何获得连续变量和分类变量之间比较的相关系数
我有这个数据集。我试图找到连续变量 expr
与分类变量 WHO_Grade
之间的相关性:
> dput(tmp)
structure(list(expr = c(3.72491159808923, 7.8316405937301, 4.1302793124001,
6.81536170645658, 6.68352582647051, 6.0974581720256, 6.81642917136002,
6.52282686468863, 6.95033151442703, 7.40122305409127, 6.734502473652,
4.52338197246748, 5.66198159225926, 6.35210096732929, 5.98394091367302,
6.17792680351041, 6.99774731062209, 6.47837700390364, 8.46842852300251,
8.8053866571277, 7.69349747186817, 9.92409345097255, 8.32535569092761,
11.0752169414371, 6.46020070978671, 6.49791316573007, 4.67879084729252,
6.27362589525792, 5.57597697034067, 4.81081903029741, 6.49576031725988,
5.03389765403437, 5.07427129999886), WHO_Grade = c("4", "3",
"3", "3", "3", "3", "3", "3", "2", "2", "2", "4", "4", "3", "4",
"3", "3", "4", "1", "1", "1", "1", "1", "1", "1", "4", "4", "4",
"4", "4", "4", "4", "4")), class = "data.frame", row.names = c(NA,
-33L))
> kruskal.test(expr ~ WHO_Grade, data = tmp)
Kruskal-Wallis rank sum test
data: expr by WHO_Grade
Kruskal-Wallis chi-squared = 19.659, df = 3, p-value = 0.0001998
这是相同数据的箱线图。从箱线图中可以明显看出,表达与 WHO_Grade 之间存在负相关(1-4 表示疾病严重程度增加)。有没有一种方法可以获得单个值(类似相关系数),该值可以告诉我关系是负相关还是正相关,而无需查看绘图?
I have this dataset. I am trying to find the correlation between a continuous variable expr
with a categorical variable WHO_Grade
:
> dput(tmp)
structure(list(expr = c(3.72491159808923, 7.8316405937301, 4.1302793124001,
6.81536170645658, 6.68352582647051, 6.0974581720256, 6.81642917136002,
6.52282686468863, 6.95033151442703, 7.40122305409127, 6.734502473652,
4.52338197246748, 5.66198159225926, 6.35210096732929, 5.98394091367302,
6.17792680351041, 6.99774731062209, 6.47837700390364, 8.46842852300251,
8.8053866571277, 7.69349747186817, 9.92409345097255, 8.32535569092761,
11.0752169414371, 6.46020070978671, 6.49791316573007, 4.67879084729252,
6.27362589525792, 5.57597697034067, 4.81081903029741, 6.49576031725988,
5.03389765403437, 5.07427129999886), WHO_Grade = c("4", "3",
"3", "3", "3", "3", "3", "3", "2", "2", "2", "4", "4", "3", "4",
"3", "3", "4", "1", "1", "1", "1", "1", "1", "1", "4", "4", "4",
"4", "4", "4", "4", "4")), class = "data.frame", row.names = c(NA,
-33L))
> kruskal.test(expr ~ WHO_Grade, data = tmp)
Kruskal-Wallis rank sum test
data: expr by WHO_Grade
Kruskal-Wallis chi-squared = 19.659, df = 3, p-value = 0.0001998
And here is the boxplot for the same data. As evident from the boxplot, there is a negative correlation between the expression and WHO_Grade (1-4 are in increasing severity of disease). Is there a way I can obtain a single value (something like a correlation coefficient) which can tell me that the relationship is negatively or positively correlated without having to look at the plot?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
Kruskal-Wallis 检验评估样本中任何类别之间是否存在显着差异(总体值)。我会进行成对威尔科克森检验,以评估每组之间连续变量的差异。
虽然从技术上讲你可以做成对的 Kruskal-Wallis,但它的设计包括
至少3类。 Wilcoxon 检验也是一种用于比较两组的非参数检验。
Kruskal-Wallis test evaluates if there is a significant variation between any categories in the sample (overall value). I would do pairwise Wilcoxon test, to evaluate the difference in the continious variable between each group.
While you technically can do pairwise Kruskal-Wallis, it is designed to include
at least 3 categories. Wilcoxon's test is also a non parametric test to compare two groups.