对ggplot2中的特定数据进行统计测试

发布于 2024-10-28 07:16:07 字数 2289 浏览 2 评论 0原文

我编写了一个使用 ggplot2 生成图的脚本，在每个图中都有多个 x 轴值，并且每个图在 y 轴上都有多个值，用于该轴上的多个变量。

我会以另一种方式问这个问题：我在一个数据框中有多个数据子集，在 for 循环内生成，我如何控制 for 的循环以便生成每行中包含的另一个数据框（值先前数据帧的第一列）

for (x in phy) {
    print(x)

    test<-subset(t, Phylum==x)
    dat <- melt(test, measure=c("A","C","G","T","(A-T)/(A+T)","(G-C)/(G+T)",
                                "(A+T)/(G+C)"))
    unitest <- unique(c(test$Class))
    #print(nrow(test))
    i <- 1
    for(y in unitest) {
        towork <- subset(test, Class==y)

        # here i want to create a data frame that will contain (in each row, the
        # value of the first column of the towork subset for each y)

        # atest=wilcox.test(towork$A,towork$A, correct=FALSE)
        # print(paste(paste(y,towork$A),towork$A))
    }
}



input:

    e.g 
    class1:
    0.268912    0.158921    0.214082    0.358085
    1.680946         0.314681   0.210526    0.166895
    0.286945    0.322006    0.147361    0.243688
    class2
    0.293873    0.327516    0.156235    0.222376    
    0.327430    0.308667    0.135710    0.227695    
    0.301488    0.326511    0.125865    0.246022    
    0.310980    0.308730    0.148861    0.231429

我希望新数据帧在每行中包含每个类的第一列。

output
    e.g
    1st row: 0.268912 1.680946 0.286945
    2nd row:0.293873 0.327430 0.301488 0.310980

ETC... 然后是另一个数据框，每行包含每个类的第二列等等...

比我想对新数据帧的每两行一起执行统计测试（例如Wilcoxon Rank Sum Test）并获得结果。

如有任何帮助，请

Hello , i came up with an idea , but i need your help to do it.
first the data is in a large text file and i will upload it if you want , my idea is : create a function that take 2 argument : 
1.the name of the column which should be used for grouping the data (e.g. phylum, or class)
2. the name of the column containing the data to test (e.g. A,C,G,T)
and i will test the data for each phylum first , and if i want i will test it for each class in each phylum.
that's mean,i will take the A column for first phylum and A column for 2nd phylum and make the wilcox.test on them ,  and i will make the process for each common column in each phylum. and then i will use a subset function to test the classes inside each phylum.  
give me your opininon with this ??

提前谢谢。

原文

i wrote a script that generate plots using ggplot2 , and in each plot there is multiple x-axis value and each one of them have multiple values on the y-axis for multiple variables on this axis.

i will ask the question in another way : i have multiple subset of data in a data frame , generated inside a for loop , how can i control the looping of the for in order to generate another data frame that contain in each row (the value of the first column of the previous data frames)

for (x in phy) {
    print(x)

    test<-subset(t, Phylum==x)
    dat <- melt(test, measure=c("A","C","G","T","(A-T)/(A+T)","(G-C)/(G+T)",
                                "(A+T)/(G+C)"))
    unitest <- unique(c(test$Class))
    #print(nrow(test))
    i <- 1
    for(y in unitest) {
        towork <- subset(test, Class==y)

        # here i want to create a data frame that will contain (in each row, the
        # value of the first column of the towork subset for each y)

        # atest=wilcox.test(towork$A,towork$A, correct=FALSE)
        # print(paste(paste(y,towork$A),towork$A))
    }
}



input:

    e.g 
    class1:
    0.268912    0.158921    0.214082    0.358085
    1.680946         0.314681   0.210526    0.166895
    0.286945    0.322006    0.147361    0.243688
    class2
    0.293873    0.327516    0.156235    0.222376    
    0.327430    0.308667    0.135710    0.227695    
    0.301488    0.326511    0.125865    0.246022    
    0.310980    0.308730    0.148861    0.231429

i want to the new data frame to contain in each row the first column of each class.

output
    e.g
    1st row: 0.268912 1.680946 0.286945
    2nd row:0.293873 0.327430 0.301488 0.310980

etc...
and then another data frame that contain in each row the 2nd column of each class
etc...

than i want to perform a statistical test on each 2 row of the new data frame together (e.g Wilcoxon Rank Sum Test) and get the result.

any help would be appreciated

Hello , i came up with an idea , but i need your help to do it.
first the data is in a large text file and i will upload it if you want , my idea is : create a function that take 2 argument : 
1.the name of the column which should be used for grouping the data (e.g. phylum, or class)
2. the name of the column containing the data to test (e.g. A,C,G,T)
and i will test the data for each phylum first , and if i want i will test it for each class in each phylum.
that's mean,i will take the A column for first phylum and A column for 2nd phylum and make the wilcox.test on them ,  and i will make the process for each common column in each phylum. and then i will use a subset function to test the classes inside each phylum.  
give me your opininon with this ??

thnx in advance.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

伴梦长久 2024-11-04 07:16:07

我想这会做你所追求的。我们不一定需要经历为感兴趣的四个变量创建新 data.frame 的过程 - 我们可以从 class1 和 class2< 中各自的位置提取感兴趣的列/代码>。代码已更新以查找 class1 和 class2 之间的公共列。它只会计算那些常见列的 wilcox 检验。

class1 <- matrix(rnorm(12), ncol = 4)
class2 <- matrix(rnorm(16), ncol = 4)

computeWilcox <- function(x, y, correct = FALSE, ...) {

    if (!is.numeric(x)) stop("x must be numeric.")
    if (!is.numeric(y)) stop("y must be numeric.")

    commonCols <- intersect(colnames(x), colnames(y))

    ret <- vector("list", length(commonCols))

    for (col in 1:length(commonCols)) {
        ret[[col]] <- wilcox.test(x[, col], y[, col], correct = correct, ...)
    }

    names(ret) <- commonCols
    return(ret)
}


zz <- computeWilcox(class1, class2)

其中 zz 的结构如下：

> str(zz)
List of 2
 $ c:List of 7
  ..$ statistic  : Named num 0
  .. ..- attr(*, "names")= chr "W"
  ..$ parameter  : NULL
  ..$ p.value    : num 0.0571
  ..$ null.value : Named num 0
  .. ..- attr(*, "names")= chr "location shift"
  ..$ alternative: chr "two.sided"
  ..$ method     : chr "Wilcoxon rank sum test"
  ..$ data.name  : chr "x[, col] and y[, col]"
  ..- attr(*, "class")= chr "htest"
 $ d:List of 7
  ..$ statistic  : Named num 2
  .. ..- attr(*, "names")= chr "W"
  ..$ parameter  : NULL
  ..$ p.value    : num 0.229
  ..$ null.value : Named num 0
  .. ..- attr(*, "names")= chr "location shift"
  ..$ alternative: chr "two.sided"
  ..$ method     : chr "Wilcoxon rank sum test"
  ..$ data.name  : chr "x[, col] and y[, col]"
  ..- attr(*, "class")= chr "htest"

您可以从返回的列表对象中提取参数或 p 值，如下所示：

> zz$c$p.value
[1] 0.05714286

I think this will do what you are after. We don't necessarily need to go through the process of making new data.frames for the four variables of interest - we can extract the columns of interest from their respective locations within class1 and class2. Code has been updated to find the common columns between class1 and class2. It will only compute the wilcox test for those common columns.

class1 <- matrix(rnorm(12), ncol = 4)
class2 <- matrix(rnorm(16), ncol = 4)

computeWilcox <- function(x, y, correct = FALSE, ...) {

    if (!is.numeric(x)) stop("x must be numeric.")
    if (!is.numeric(y)) stop("y must be numeric.")

    commonCols <- intersect(colnames(x), colnames(y))

    ret <- vector("list", length(commonCols))

    for (col in 1:length(commonCols)) {
        ret[[col]] <- wilcox.test(x[, col], y[, col], correct = correct, ...)
    }

    names(ret) <- commonCols
    return(ret)
}


zz <- computeWilcox(class1, class2)

Where zz has a structure like:

> str(zz)
List of 2
 $ c:List of 7
  ..$ statistic  : Named num 0
  .. ..- attr(*, "names")= chr "W"
  ..$ parameter  : NULL
  ..$ p.value    : num 0.0571
  ..$ null.value : Named num 0
  .. ..- attr(*, "names")= chr "location shift"
  ..$ alternative: chr "two.sided"
  ..$ method     : chr "Wilcoxon rank sum test"
  ..$ data.name  : chr "x[, col] and y[, col]"
  ..- attr(*, "class")= chr "htest"
 $ d:List of 7
  ..$ statistic  : Named num 2
  .. ..- attr(*, "names")= chr "W"
  ..$ parameter  : NULL
  ..$ p.value    : num 0.229
  ..$ null.value : Named num 0
  .. ..- attr(*, "names")= chr "location shift"
  ..$ alternative: chr "two.sided"
  ..$ method     : chr "Wilcoxon rank sum test"
  ..$ data.name  : chr "x[, col] and y[, col]"
  ..- attr(*, "class")= chr "htest"

You can extract the parameter or p-value out of the returned list object like this: