从 affyBatch 对象中提取原始数据
我有一个包含基因表达数据的 affyBatch 对象。数据读取使用 dat <- ReadAffy() 没有选择。然后我提取了我感兴趣的 5600 个基因, dat <- RemoveProbes(listOutProbeSets, cdfpackagename,probepackagename)
然后我使用标准化表达式数据 dat.rma <- rma(dat)
现在我想将原始数据和 rma 标准化数据导出到 .csv 文件。检查数据,我发现 exprs(dat) 的尺寸为 226576 x 30,dat.rma 的尺寸为 5600 x 30。如何提取 RAW 表达式值的 5600 x 30 矩阵?我不知道原始数据中的226576行是从哪里来的!
我是生物导体数据结构的初学者!抱歉没有提供可运行的示例代码 - 不确定在这种情况下我会如何做到这一点。
I have an affyBatch object with gene expression data. The data is read in using
dat <- ReadAffy()
with no options. I then extract the 5600 genes that I am interested in using,
dat <- RemoveProbes(listOutProbeSets, cdfpackagename, probepackagename)
I then normalise the expression data using
dat.rma <- rma(dat)
Now I want to the export the raw data AND the rma-normalised data to .csv files. Inspecting the data I find that exprs(dat) has dimensions 226576 by 30 and dat.rma has dimensions 5600 by 30. How do I extract the 5600 by 30 matrix of the RAW expression values? I don't know where the 226576 rows in the raw data have come from!
I'm a bit of a beginner with bioconductor data structures! Sorry for not providing runnable example code - not sure how I would do that in this case.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在从原始数据到 rma 标准化数据的转换过程中,除其他外,您还需要将低水平探针强度值组合/汇总为探针集值(映射到基因)。这解释了为什么原始
AffyBatch
对象比ExpressionSet
实例(由rma
函数创建)拥有更多功能。此外,根据您拥有的芯片,每个探针组有多个完美匹配 (PM) 和错配 (MM) 探针,这会增加每个探针组的探针数量。映射探针->探针集在芯片定义文件中定义并自动处理。不过还有一些额外的想法。在进行标准化之前删除探针可能不是一件好事。执行标准化时的一个假设是,大多数“基因”不会改变,因此仅保留“感兴趣的”可能会打破这一点,具体取决于“感兴趣的”的内容当然是手段。标准化后,您始终可以在
ExpressionSet
上进行过滤:希望这会有所帮助。
During transformation from raw to rma-normalised data, you have, among other things, combined/summarised low level probe intensity values into probe sets values (that map to genes). This explains why you have more features in a raw
AffyBatch
object than in aExpressionSet
instance (created by therma
function). Also, depending on the chip you have, there are several perfect match (PM) and miss match (MM) probes per probeset, which boosts the number of probes per probeset. The mapping probe -> probeset is defined in the chip definition file and handled automatically.A few additional thoughts though. Removing probes before doing normalisation might not be a good thing to do. One assumption when performing normalisation is that most of you 'genes' do not change, so keeping only 'those of interest' might break this, depending what 'those of interest' means of course. You can always do your filtering on the
ExpressionSet
, after normalisation:Hope this helps.