从 affyBatch 对象中提取原始数据

发布于 2024-11-16 15:00:46 字数 432 浏览 2 评论 0原文

我有一个包含基因表达数据的 affyBatch 对象。数据读取使用 dat <- ReadAffy() 没有选择。然后我提取了我感兴趣的 5600 个基因, dat <- RemoveProbes(listOutProbeSets, cdfpackagename,probepackagename)

然后我使用标准化表达式数据 dat.rma <- rma(dat)

现在我想将原始数据和 rma 标准化数据导出到 .csv 文件。检查数据,我发现 exprs(dat) 的尺寸为 226576 x 30,dat.rma 的尺寸为 5600 x 30。如何提取 RAW 表达式值的 5600 x 30 矩阵?我不知道原始数据中的226576行是从哪里来的!

我是生物导体数据结构的初学者!抱歉没有提供可运行的示例代码 - 不确定在这种情况下我会如何做到这一点。

I have an affyBatch object with gene expression data. The data is read in using
dat <- ReadAffy()
with no options. I then extract the 5600 genes that I am interested in using,
dat <- RemoveProbes(listOutProbeSets, cdfpackagename, probepackagename)

I then normalise the expression data using
dat.rma <- rma(dat)

Now I want to the export the raw data AND the rma-normalised data to .csv files. Inspecting the data I find that exprs(dat) has dimensions 226576 by 30 and dat.rma has dimensions 5600 by 30. How do I extract the 5600 by 30 matrix of the RAW expression values? I don't know where the 226576 rows in the raw data have come from!

I'm a bit of a beginner with bioconductor data structures! Sorry for not providing runnable example code - not sure how I would do that in this case.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

我要还你自由 2024-11-23 15:00:46

在从原始数据到 rma 标准化数据的转换过程中,除其他外,您还需要将低水平探针强度值组合/汇总为探针集值(映射到基因)。这解释了为什么原始 AffyBatch 对象比 ExpressionSet 实例(由 rma 函数创建)拥有更多功能。此外,根据您拥有的芯片,每个探针组有多个完美匹配 (PM) 和错配 (MM) 探针,这会增加每个探针组的探针数量。映射探针->探针集在芯片定义文件中定义并自动处理。

不过还有一些额外的想法。在进行标准化之前删除探针可能不是一件好事。执行标准化时的一个假设是,大多数“基因”不会改变,因此仅保留“感兴趣的”可能会打破这一点,具体取决于“感兴趣的”的内容当然是手段。标准化后,您始终可以在 ExpressionSet 上进行过滤:

> library(affydata)
> data(Dilution) ## gets some test data
> eset <- rma(Dilution) ## rma normalisation
> featureNames(eset)[1:10] ## gets some probesets of interest
> ps
 [1] "100_g_at"  "1000_at"   "1001_at"   "1002_f_at" "1003_s_at" "1004_at"  
 [7] "1005_at"   "1006_at"   "1007_s_at" "1008_f_at"
> dim(eset) ## full dataset
Features  Samples 
   12625        4 
> dim(eset[ps,]) ## only 10 first probesets of interest
Features  Samples 
      10        4 

希望这会有所帮助。

During transformation from raw to rma-normalised data, you have, among other things, combined/summarised low level probe intensity values into probe sets values (that map to genes). This explains why you have more features in a raw AffyBatch object than in a ExpressionSet instance (created by the rma function). Also, depending on the chip you have, there are several perfect match (PM) and miss match (MM) probes per probeset, which boosts the number of probes per probeset. The mapping probe -> probeset is defined in the chip definition file and handled automatically.

A few additional thoughts though. Removing probes before doing normalisation might not be a good thing to do. One assumption when performing normalisation is that most of you 'genes' do not change, so keeping only 'those of interest' might break this, depending what 'those of interest' means of course. You can always do your filtering on the ExpressionSet, after normalisation:

> library(affydata)
> data(Dilution) ## gets some test data
> eset <- rma(Dilution) ## rma normalisation
> featureNames(eset)[1:10] ## gets some probesets of interest
> ps
 [1] "100_g_at"  "1000_at"   "1001_at"   "1002_f_at" "1003_s_at" "1004_at"  
 [7] "1005_at"   "1006_at"   "1007_s_at" "1008_f_at"
> dim(eset) ## full dataset
Features  Samples 
   12625        4 
> dim(eset[ps,]) ## only 10 first probesets of interest
Features  Samples 
      10        4 

Hope this helps.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文