迭代包含 3000 行的大矩阵并计算相关性。-后续!
谢谢尼科!在我修正了一些小错误之后就快到了。这里我附上我的脚本:
datamatrix=read.table("ref.txt", sep="\t", header=T, row.names=1)
correl <- NULL
for (i in 1:nrow(datamatrix)) {
correl <- apply(datamatrix, 1, function(x) {cor(t(datamatrix[, i]))})
write.table(correl, paste(row.names(datamatrix)[i], ".txt", sep=""))
}
但我担心function(x)
部分有问题,似乎是t(datamatrix[i,j])
,这将计算任意两行的corr。
实际上我需要遍历矩阵。首先 cor(row01, row02)
获取 rwo01 和 row02 之间的一个相关性;然后 cor(row01, row03) 获取 row01 和 rwo03 的相关性,...直到 row01 row30000 之间的相关性。现在我得到了 row01 Row01 1.000 Row02 0.012 Row03 0.023 Row04 0.820 Row05 0.165 Row06 0.230 Row07 0.376 Row08 0.870 的第一列,并将其保存到文件 row01.txt 中。
同样获取 Row02 Row01 0.012 Row02 1.000 Row03 0.023 Row04 0.820 Row05 0.165 Row06 0.230 Row07 0.376 Row08 0.870 并将其保存到文件 row02.txt。
我总共会得到 30000 个文件。这很愚蠢,但是这可以跳过内存限制,并且可以轻松处理特定行的关联。
Thanks Nico! Almost got there after I corrected small bugs. Here I attache my script:
datamatrix=read.table("ref.txt", sep="\t", header=T, row.names=1)
correl <- NULL
for (i in 1:nrow(datamatrix)) {
correl <- apply(datamatrix, 1, function(x) {cor(t(datamatrix[, i]))})
write.table(correl, paste(row.names(datamatrix)[i], ".txt", sep=""))
}
But I am afraid the function(x)
part is of problem, that seems to be t(datamatrix[i,j])
, which will calculate corr of any two rows.
Actually I need to iterate through the matrix. first cor(row01, row02)
get one correlation between rwo01 and row02; then cor(row01, row03)
to get the correlation of row01 and rwo03, ....and till correlation between row01 row30000. Now I got the first column for row01 Row01 1.000 Row02 0.012 Row03 0.023 Row04 0.820 Row05 0.165 Row06 0.230 Row07 0.376 Row08 0.870 and save it to file row01.txt.
Similarly get Row02 Row01 0.012 Row02 1.000 Row03 0.023 Row04 0.820 Row05 0.165 Row06 0.230 Row07 0.376 Row08 0.870 and save it to file row02.txt.
Totally I will get 30000 files. It is stupid, but this can skip the memory limit and can be easily handled for the correlation of a specific row.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
首先,您的代码是错误的:相关步骤应该是:
然后,您最好显式关闭连接,否则 R 可能会留下太多打开的连接。
最后,使用 write.table 并不能保证您可以轻松检索数据。您必须自己构建一个表。试试这个代码:
First, your code is erroneous : The correl step should be :
Then, you better close the connections explicitly, otherwise R may leave too many connections open.
Finally, using write.table doesn't guarantee that you can retrieve the data easily. You have to construct a table yourself. Try this code :