多个barplot和t检验

发布于 2025-02-09 20:23:45 字数 1351 浏览 1 评论 0原文

我想要一个基于r中数据集中特定列中字符串的出现数量。

同时，我想运行一个t检验，并使用杆顶部的恒星绘制重要的p值。非重要性可以表示为ns。

我的尝试是：

barplot(prop.table(table(ttcluster_dataset$Phenotype)),col=clustercolor,border="black",xlab="Phenotypes",ylab="Percentage of Samples expressed",main="Sample wise Phenotype distribution",cex.names = 0.8)

数据集列是：

ttcluster_dataset$Phenotype<- 
structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 
7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L), .Label = c("Proneural (Cluster 1)", "Proneural (Cluster 2)", "Neural (Cluster 1)", "Neural (Cluster 2)", 
"Classical (Cluster 1)", "Classical (Cluster 2)", "Mesenchymal (Cluster 1)", 
"Mesenchymal (Cluster 2)"), class = "factor")

应批准所有建议。

原文

I want a barplot based on the number of occurrences of a string in a particular column in a dataset in r.

At the same time, I want to run a t-test and plot the significant p-values using stars on the top of the bars. The nonsignificant can be represented as ns.

My attempt has been:

barplot(prop.table(table(ttcluster_dataset$Phenotype)),col=clustercolor,border="black",xlab="Phenotypes",ylab="Percentage of Samples expressed",main="Sample wise Phenotype distribution",cex.names = 0.8)

The dataset column is:

ttcluster_dataset$Phenotype<- 
structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 
7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L), .Label = c("Proneural (Cluster 1)", "Proneural (Cluster 2)", "Neural (Cluster 1)", "Neural (Cluster 2)", 
"Classical (Cluster 1)", "Classical (Cluster 2)", "Mesenchymal (Cluster 1)", 
"Mesenchymal (Cluster 2)"), class = "factor")

All suggestions shall be apprciated.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

短暂陪伴 2025-02-16 20:23:46

t检验可能不是您想要的，因为您正在研究两个群集之间的计数和比例。您的数据实际上并不是要进行的，因此首先需要将两个变量拆分：

Pheno.splt <- strsplit(as.character(ttcluster_dataset$Phenotype), " ")
Pheno.mat <- do.call(rbind, x)[, c(1, 3)]
ttclust <- data.frame(Phenotype=Pheno.mat[, 1], Cluster=gsub(")", "", Pheno.mat[, 2]))
str(ttclust)
# 'data.frame': 171 obs. of  2 variables:
#  $ Phenotype: chr  "Proneural" "Proneural" "Proneural" "Proneural" ...
#  $ Cluster  : chr  "1" "1" "1" "1" ...

现在的表型和群集是数据框中的单独列。有多种方法可以做到这一点，但是在这里，我们将您的表型分为三个部分，通过在它们之间的空间上分开。现在ttclust是具有两个变量的数据框架。现在一个摘要表和条图：

tbl <- xtabs(~Phenotype+Cluster, ttclust)
tbl
#              Cluster
# Phenotype      1  2
#   Classical   32  6
#   Mesenchymal 44 10
#   Neural      26  0
#   Proneural   45  8
tbl.row <- prop.table(tbl, 1)
barplot(t(tbl.row), beside=TRUE)

prop.test(tbl)

4-sample test for equality of proportions without continuity correction

data:  tbl
X-squared = 5.2908, df = 3, p-value = 0.1517
alternative hypothesis: two.sided
sample estimates:
   prop 1    prop 2    prop 3    prop 4 
0.8421053 0.8148148 1.0000000 0.8490566

在每种情况下，与群集2的差异都显着差异：

for(i in 1:4) print(prop.test(t(tbl[i, ])))

# First test
# 
#   1-sample proportions test with continuity correction
# 
# data:  t(tbl[i, ]), null probability 0.5
# X-squared = 16.447, df = 1, p-value = 5.002e-05
# alternative hypothesis: true p is not equal to 0.5
# 95 percent confidence interval:
#  0.6807208 0.9341311
# sample estimates:
#         p 
# 0.8421053 
    . . . .

A t-test is probably not what you want since you are looking at counts and proportions between the two clusters. Your data is not really set up to do either one so first we need to split the two variables:

Pheno.splt <- strsplit(as.character(ttcluster_dataset$Phenotype), " ")
Pheno.mat <- do.call(rbind, x)[, c(1, 3)]
ttclust <- data.frame(Phenotype=Pheno.mat[, 1], Cluster=gsub(")", "", Pheno.mat[, 2]))
str(ttclust)
# 'data.frame': 171 obs. of  2 variables:
#  $ Phenotype: chr  "Proneural" "Proneural" "Proneural" "Proneural" ...
#  $ Cluster  : chr  "1" "1" "1" "1" ...

Now Phenotype and Cluster are separate columns in the data frame. There are multiple ways to do this, but here we just split your Phenotype into three parts by splitting on the space between them. Now ttclust is as data frame with two variables. Now a summary table and bar plot:

tbl <- xtabs(~Phenotype+Cluster, ttclust)
tbl
#              Cluster
# Phenotype      1  2
#   Classical   32  6
#   Mesenchymal 44 10
#   Neural      26  0
#   Proneural   45  8
tbl.row <- prop.table(tbl, 1)
barplot(t(tbl.row), beside=TRUE)

At this point, a simple proportions test indicates that there is no difference in percent of Cluster 1 across the four Phenotypes:

prop.test(tbl)

4-sample test for equality of proportions without continuity correction

data:  tbl
X-squared = 5.2908, df = 3, p-value = 0.1517
alternative hypothesis: two.sided
sample estimates:
   prop 1    prop 2    prop 3    prop 4 
0.8421053 0.8148148 1.0000000 0.8490566

Using `prop.test' on each Phenotype indicates that Cluster 1 is significantly difference from Cluster 2 in every case:

for(i in 1:4) print(prop.test(t(tbl[i, ])))

# First test
# 
#   1-sample proportions test with continuity correction
# 
# data:  t(tbl[i, ]), null probability 0.5
# X-squared = 16.447, df = 1, p-value = 5.002e-05
# alternative hypothesis: true p is not equal to 0.5
# 95 percent confidence interval:
#  0.6807208 0.9341311
# sample estimates:
#         p 
# 0.8421053 
    . . . .

回复收藏 0 原文

~没有更多了~