什么更容易?合并变量还是指示变量?
我有两组数据想要调查。第一个是给定不同“细胞状态”的基因/基因组相关数据。第二组数据将基因与生物途径联系起来。我相信我的问题是一个关系数据库问题。
'如何显示与一个数据帧相关的数据并将其与另一个数据帧相关联。换句话说,我想绘制细胞状态数据的图表,并将其与通路及其特定基因联系起来。 (我认为在图片中,所以这里是。)
dataframe1-来自 affymetrix 基因芯片的数据
基因、细胞状态1、细胞状态2...
基因1,x1,y1,...
基因2、x2、y2、...
基因.x,... ...
“1”“基因”“log_b”“log_b_rich”“Fc_cdt_rich_tot”“fc_Etoh_CDT_tot_mono”“fc_Etoh_CDT_tot_poly”“fc_Etoh_CDT_mono_poly”“fc_Etoh_Rich_tot_mono” “fc_Etoh_Rich_tot_poly”“fc_Etoh_Rich_mono_poly” "2" "PHF13" -2.712616698 -1.47923545 -0.791138043 -0.549610558 0.143808182 0.69341874 0.320812876 1.089260116 0.76844724
“3”“SPSB1”-1.808348454 -1.965601198 -1.349135752 -0.780105329 0.410647447 1.190752776 0.587287796 1.260350195 0.673062399
dataframe2-来自 kegg 数据库的数据
通路1、基因-x1、基因-x2、...
通路2,基因-y1,基因-y2,...
通路3,基因-z1,...
“1”“KEGG_GLYCOLYSIS_GLUCONEOGENESIS”“PHF13”“LDHB”“LDHA”“PGAM1”“ADH1C”“PGAM2”“ADH1B”“ADH1A”“ACSS2”“PDHB” “ACSS1” “PGAM4” “PDHA2” “PDHA1” “LDHAL6B” “PFKL” “LDHAL6A” “FBP1” “PFKP” “ALDH3B2” “FBP2” “PFKM” “ALDH3B1” “PGM2” “G6PC” “ALDH7A1” “ALDH1B1” “PKM2” “PGM1” “DLD” “PKLR” “ALDH9A1” “ALDOA ” “ALDOC” “ALDOB” “ADH5” “HK2” “HK1” “ADH6” “ADH7” “ALDH3A2” “G6PC2” “ALDH3A1” “GALM” “TPI1” “AKR1A1” “ADH4” “HK3” “ALDH1A3” “ENO2” “ENO3” “GAPDH” “ENO1” “BPGM” “DLAT ” “PCK2” “PCK1” “GPI” “GCK” “ALDH2” “PGK1” “PGK2”
“2”“KEGG_CITRATE_CYCLE_TCA_CYCLE”“PHF13”“OGDHL”“OGDH”“PDHB”“IDH3G”“LOC283398”“IDH2”“IDH1”“PDHA2”“PDHA1” “SUCLA2” “FH” “DLST” “ACO2” “SUCLG2” “ACO1”
“PHF13” 突出显示以显示每个步骤的相关性。
我想做的是,看看“cell-state1”是否(in-)激活了“cell-state2”的不同基因/途径。此外,我想测试特定路径的细胞状态 1 与 2 之间的相关性(t 检验,也可能是绘图)。
我的问题是,哪种命令或方法可以让我最轻松/最有效地执行此操作:合并或使用虚拟变量?
HTH
I have two sets of data that I would like to investigate. The first is gene/genome related data given different 'cell-states'. The second set of data is relates the genes to a biological pathway. I believe my question is a relational db one.
'How can I show the data related from one dataframe and relate it to another. In other words, I want to graph the cell-state data and relate it to pathways and their specific genes. (I think in pictures so here goes.)
dataframe1-data from an affymetrix gene-chip
gene, cell-state1, cell-state2...
gene1, x1, y1,...
gene2, x2, y2,...
gene.x, ... ...
"1" "gene" "log_b" "log_b_rich" "Fc_cdt_rich_tot" "fc_Etoh_CDT_tot_mono" "fc_Etoh_CDT_tot_poly" "fc_Etoh_CDT_mono_poly" "fc_Etoh_Rich_tot_mono" "fc_Etoh_Rich_tot_poly" "fc_Etoh_Rich_mono_poly"
"2" "PHF13" -2.712616698 -1.47923545 -0.791138043 -0.549610558 0.143808182 0.69341874 0.320812876 1.089260116 0.76844724
"3" "SPSB1" -1.808348454 -1.965601198 -1.349135752 -0.780105329 0.410647447 1.190752776 0.587287796 1.260350195 0.673062399
dataframe2-data from the kegg db
pathway1, gene-x1, gene-x2, ...
pathway2, gene-y1, gene-y2, ...
pathway3, gene-z1, ...
"1" "KEGG_GLYCOLYSIS_GLUCONEOGENESIS" "PHF13" "LDHB" "LDHA" "PGAM1" "ADH1C" "PGAM2" "ADH1B" "ADH1A" "ACSS2" "PDHB" "ACSS1" "PGAM4" "PDHA2" "PDHA1" "LDHAL6B" "PFKL" "LDHAL6A" "FBP1" "PFKP" "ALDH3B2" "FBP2" "PFKM" "ALDH3B1" "PGM2" "G6PC" "ALDH7A1" "ALDH1B1" "PKM2" "PGM1" "DLD" "PKLR" "ALDH9A1" "ALDOA" "ALDOC" "ALDOB" "ADH5" "HK2" "HK1" "ADH6" "ADH7" "ALDH3A2" "G6PC2" "ALDH3A1" "GALM" "TPI1" "AKR1A1" "ADH4" "HK3" "ALDH1A3" "ENO2" "ENO3" "GAPDH" "ENO1" "BPGM" "DLAT" "PCK2" "PCK1" "GPI" "GCK" "ALDH2" "PGK1" "PGK2"
"2" "KEGG_CITRATE_CYCLE_TCA_CYCLE" "PHF13" "OGDHL" "OGDH" "PDHB" "IDH3G" "LOC283398" "IDH2" "IDH1" "PDHA2" "PDHA1" "SUCLA2" "FH" "DLST" "ACO2" "SUCLG2" "ACO1"
"PHF13" is highlighted to show relevance in each step.
What I want to do is, see if 'cell-state1' (in-)activates different genes / pathways from 'cell-state2.' Furthermore, I would like to test for correlation (t-test and maybe graphing) between the cell-states 1 Vs 2 for specific pathways.
My question is, which commands or method would allow me to do this most easily/efficiently: merge or using dummy variable?
HTH
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这听起来像是您需要的是因子分析。您可以向 statistics.stackexchange.com 的好心人询问此事。
This sounds like what you need is a factor-analysis. You could ask the good people of statistics.stackexchange.com about that.