在 R 中使用 GEOquery 和 SAM/Siggenes
由于需要,我最近开始学习 R,到目前为止,我认为一切都很好。但我仍处于早期阶段。然而,我在 R 中面临着这一重大而紧迫的挑战,我将非常感谢您的帮助。我的编程技能显然是业余的,并且肯定会接受我能得到的任何帮助。如下:
- 创建要从 GEO 检索的数据集列表 (gdslist) 数据库使用 GEOquery 包
- 将 gdslist 项 (gdsid) 转换为表达式数据。那是, 可以配合我的分析的数据。为此,GDS2eSet 函数 工作做得很好。
- 要以这样的方式读入转换后的表达式数据 可以创建类/级别文件 (.cls)。 GDS3715 数据集 例如,有 3 个级别——胰岛素抵抗、胰岛素敏感和 糖尿病患者。有时,数据集就是这么简单。但 其他时候,例如在本例中,分析级别将为 6 目的,因为虽然表型上有 3 个水平,但它们 被分为治疗组和未治疗组。经常有一个 在这种情况下添加了“代理”栏。每个班级/级别都需要 分配一个数字(0,1,2...)。一般情况就这样了 .cls 文件的格式。
- 要运行 Siggenes/SAM 分析(也是 R 中的包),需要两个文件 每个数据集需要:一个表达式文件(从 2 个数据集转换而来的文件) 上面)和随附的集群文件(来自 3)。
- 为了能够以某种方式对 gdslist 项目运行此过程 循环并将我的数据存储在指定的目录中。
我目前只能进行第 2 步。我认为第 3 步是挑战的关键...
非常感谢您的期待。
到目前为止的脚本:
> gdslist = c('GDS3715','GDS3716','GDS3717'...)#up to perhaps 100 datasets
> analysisfunc = function(gdsid) {
gdsdat = getGEO(gdsid,destdir=".")
gdseset = GDS2eSet(gdsdat)
pData(gdseset)$disease.state #Needed assignment, etc...Step 3 stuff ;Siggenes/SAM can perhaps be done here
return(sprintf("Results from %s should be here",gdsid))
}
> resultlist = sapply(gdslist,analysisfunc) #loop function
I very recently started learning R as a result of a need and so far, so good, I think. But I'm still in the very early stages. I am however faced with this major urgent challenge in R that I will greatly appreciate some help with. My programming skills are quite obviously amateur and will most definitely accept any help I can get. Here goes:
- To create a list of datasets (gdslist) to be retrieved from the GEO
database using the GEOquery package - To convert the gdslist items (gdsid) to expression data. That is,
data that can work with my analysis. For this, a GDS2eSet function
does the work fine. - To read in this converted expression data in a way that a
class/levels file (.cls) can be created. The GDS3715 dataset for
example, has 3 levels – insulin resistant, insulin sensitive and
diabetic. Sometimes, the datasets are as straightforward as that. But
other times, like in this case, the levels will be 6 for analysis
purposes, because although there are phenotypically 3 levels, they
have been divided into treated and untreated groups. There is often an
added “agent” column in such cases. Each class/level needs to be
assigned a numerical number (0,1,2...). That’s pretty much the general
format of a .cls file. - To run a Siggenes/SAM analysis (also a package in R), two files are
needed for each dataset: an expression file (the converted file from 2
above) and the accompanying cluster file (from 3). - To be able to run this process for the gdslist items in a kind of
loop and have my data stored in a specified directory.
I can currently only get to step 2. I think step 3 is the crux of the challenge...
Many thanks in anticipation.
Script so far:
> gdslist = c('GDS3715','GDS3716','GDS3717'...)#up to perhaps 100 datasets
> analysisfunc = function(gdsid) {
gdsdat = getGEO(gdsid,destdir=".")
gdseset = GDS2eSet(gdsdat)
pData(gdseset)$disease.state #Needed assignment, etc...Step 3 stuff ;Siggenes/SAM can perhaps be done here
return(sprintf("Results from %s should be here",gdsid))
}
> resultlist = sapply(gdslist,analysisfunc) #loop function
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这应该适用于所有 gds 数据集。
This should work for all the gds datasets.