对ggplot2中的特定数据进行统计测试
我编写了一个使用 ggplot2 生成图的脚本,在每个图中都有多个 x 轴值,并且每个图在 y 轴上都有多个值,用于该轴上的多个变量。
我会以另一种方式问这个问题:我在一个数据框中有多个数据子集,在 for 循环内生成,我如何控制 for 的循环以便生成每行中包含的另一个数据框(值先前数据帧的第一列)
for (x in phy) {
print(x)
test<-subset(t, Phylum==x)
dat <- melt(test, measure=c("A","C","G","T","(A-T)/(A+T)","(G-C)/(G+T)",
"(A+T)/(G+C)"))
unitest <- unique(c(test$Class))
#print(nrow(test))
i <- 1
for(y in unitest) {
towork <- subset(test, Class==y)
# here i want to create a data frame that will contain (in each row, the
# value of the first column of the towork subset for each y)
# atest=wilcox.test(towork$A,towork$A, correct=FALSE)
# print(paste(paste(y,towork$A),towork$A))
}
}
input:
e.g
class1:
0.268912 0.158921 0.214082 0.358085
1.680946 0.314681 0.210526 0.166895
0.286945 0.322006 0.147361 0.243688
class2
0.293873 0.327516 0.156235 0.222376
0.327430 0.308667 0.135710 0.227695
0.301488 0.326511 0.125865 0.246022
0.310980 0.308730 0.148861 0.231429
我希望新数据帧在每行中包含每个类的第一列。
output
e.g
1st row: 0.268912 1.680946 0.286945
2nd row:0.293873 0.327430 0.301488 0.310980
ETC... 然后是另一个数据框,每行包含每个类的第二列 等等...
比我想对新数据帧的每两行一起执行统计测试(例如Wilcoxon Rank Sum Test)并获得结果。
如有任何帮助,请
Hello , i came up with an idea , but i need your help to do it.
first the data is in a large text file and i will upload it if you want , my idea is : create a function that take 2 argument :
1.the name of the column which should be used for grouping the data (e.g. phylum, or class)
2. the name of the column containing the data to test (e.g. A,C,G,T)
and i will test the data for each phylum first , and if i want i will test it for each class in each phylum.
that's mean,i will take the A column for first phylum and A column for 2nd phylum and make the wilcox.test on them , and i will make the process for each common column in each phylum. and then i will use a subset function to test the classes inside each phylum.
give me your opininon with this ??
提前谢谢。
i wrote a script that generate plots using ggplot2 , and in each plot there is multiple x-axis value and each one of them have multiple values on the y-axis for multiple variables on this axis.
i will ask the question in another way : i have multiple subset of data in a data frame , generated inside a for loop , how can i control the looping of the for in order to generate another data frame that contain in each row (the value of the first column of the previous data frames)
for (x in phy) {
print(x)
test<-subset(t, Phylum==x)
dat <- melt(test, measure=c("A","C","G","T","(A-T)/(A+T)","(G-C)/(G+T)",
"(A+T)/(G+C)"))
unitest <- unique(c(test$Class))
#print(nrow(test))
i <- 1
for(y in unitest) {
towork <- subset(test, Class==y)
# here i want to create a data frame that will contain (in each row, the
# value of the first column of the towork subset for each y)
# atest=wilcox.test(towork$A,towork$A, correct=FALSE)
# print(paste(paste(y,towork$A),towork$A))
}
}
input:
e.g
class1:
0.268912 0.158921 0.214082 0.358085
1.680946 0.314681 0.210526 0.166895
0.286945 0.322006 0.147361 0.243688
class2
0.293873 0.327516 0.156235 0.222376
0.327430 0.308667 0.135710 0.227695
0.301488 0.326511 0.125865 0.246022
0.310980 0.308730 0.148861 0.231429
i want to the new data frame to contain in each row the first column of each class.
output
e.g
1st row: 0.268912 1.680946 0.286945
2nd row:0.293873 0.327430 0.301488 0.310980
etc...
and then another data frame that contain in each row the 2nd column of each class
etc...
than i want to perform a statistical test on each 2 row of the new data frame together (e.g Wilcoxon Rank Sum Test) and get the result.
any help would be appreciated
Hello , i came up with an idea , but i need your help to do it.
first the data is in a large text file and i will upload it if you want , my idea is : create a function that take 2 argument :
1.the name of the column which should be used for grouping the data (e.g. phylum, or class)
2. the name of the column containing the data to test (e.g. A,C,G,T)
and i will test the data for each phylum first , and if i want i will test it for each class in each phylum.
that's mean,i will take the A column for first phylum and A column for 2nd phylum and make the wilcox.test on them , and i will make the process for each common column in each phylum. and then i will use a subset function to test the classes inside each phylum.
give me your opininon with this ??
thnx in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我想这会做你所追求的。我们不一定需要经历为感兴趣的四个变量创建新 data.frame 的过程 - 我们可以从
class1
和class2< 中各自的位置提取感兴趣的列/代码>。代码已更新以查找 class1 和 class2 之间的公共列。它只会计算那些常见列的 wilcox 检验。
其中 zz 的结构如下:
您可以从返回的列表对象中提取参数或 p 值,如下所示:
I think this will do what you are after. We don't necessarily need to go through the process of making new data.frames for the four variables of interest - we can extract the columns of interest from their respective locations within
class1
andclass2
. Code has been updated to find the common columns between class1 and class2. It will only compute the wilcox test for those common columns.Where zz has a structure like:
You can extract the parameter or p-value out of the returned list object like this: